Code
# Set up environment
# conda env create -f /path/to/this/repo/software/conda_env.yaml
# conda activate imlabtools
Saideep Gona
November 22, 2023
Having linearized the fine-tuned DL model in 1.4, we can now run SPrediXcan using this linearization to try and identify common genetics with GWAS signals.
# Initialize submission script for SPrediXcan
with open("/beagle3/haky/users/saideep/github_repos/Daily-Blog-Sai/posts/2023-11-22-retrain_expression_predictor/1.5.2_submit_SPrediXcan.sh","w") as sub_script:
sub_script.write('''#!/bin/bash
#SBATCH --job-name=SPrediXcan_finetune
#SBATCH --account=pi-haky
#SBATCH --partition=caslake
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --time=06:00:00
#SBATCH --mem=24G\n''')
sub_script.write("module load python\n")
sub_script.write("conda activate imlabtools\n")
import os
# Set paths for predixcan
spredixcan_path = "/beagle3/haky/users/saideep/github_repos/MetaXcan/software/SPrediXcan.py"
model_db_path = "/beagle3/haky/users/saideep/projects/aracena_modeling/linearization/predixcan_RNAseq_Flu_from_HDF5_mean_log_revised/database/gtex_v7_Model_training.db"
covariance_path = "/beagle3/haky/users/saideep/projects/aracena_modeling/linearization/predixcan_RNAseq_Flu_from_HDF5_mean_log_revised/database/Covariances.txt"
gwas_folder_path = "/beagle3/haky/users/charles/project/singleXcanDL/GWAS_summary/Diagram_2022/Mahajan.NatGen2022.DIAMANTE-TA.sumstat"
gwas_file_pattern = "DIAMANTE-TA.sumstat_corrected.txt.gz"
output_dir = "/beagle3/haky/users/saideep/projects/aracena_modeling/SPrediXcan/predixcan_RNAseq_Flu_from_HDF5_mean_log_revised/"
spredixcan_com= [
spredixcan_path,
"--model_db_path", model_db_path,
"--covariance", covariance_path,
"--gwas_folder", gwas_folder_path,
"--gwas_file_pattern", gwas_file_pattern,
"--snp_column", "SNP",
"--effect_allele_column", "A1",
"--non_effect_allele_column", "A2",
"--beta_column", "BETA",
"--pvalue_column", "P",
"--output_file", os.path.join(output_dir, "2TD_SPrediXcan_results.txt"),
]
with open("/beagle3/haky/users/saideep/github_repos/Daily-Blog-Sai/posts/2023-11-22-retrain_expression_predictor/1.5.2_submit_SPrediXcan.sh","a") as sub_script:
sub_script.write(" ".join(spredixcan_com))
SPrediXcan needs to know which SNPs from the GWAS summary stats match which SNPs from the underlying trained database. One way to do this is to make sure that the columns specified by these two parameters (–model_db_snp_key, –snp_column) match. The SNP column here refers to that in the “weight” table in the stored model DB and should match the column specified in the GWAS summary stats.
variant_id | panel_variant_id | chromosome | position | effect_allele | non_effect_allele | frequency | pvalue | zscore | effect_size | standard_error | sample_size | n_cases | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | rs12238997 | 1:758351:A:G:b38 | 1 | 758351 | G | A | 0.112621 | 0.5806 | -0.552509 | -0.0060 | 0.0109 | 328697 | NaN |
1 | rs55727773 | 1:770988:A:G:b38 | 1 | 770988 | G | A | 0.516505 | 0.2501 | -1.150106 | -0.0103 | 0.0090 | 250998 | NaN |
2 | rs12565286 | 1:785910:G:C:b38 | 1 | 785910 | C | G | 0.045631 | 0.9179 | 0.103079 | 0.0021 | 0.0204 | 249749 | NaN |
3 | rs4951859 | 1:794299:C:G:b38 | 1 | 794299 | G | C | 0.840777 | 0.6551 | -0.446688 | -0.0040 | 0.0090 | 335141 | NaN |
4 | rs148120343 | 1:794707:T:C:b38 | 1 | 794707 | C | T | 0.050485 | 0.9288 | -0.089355 | -0.0016 | 0.0179 | 247336 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9980951 | rs9616832 | 22:50780959:T:C:b38 | 22 | 50780959 | C | T | 0.080583 | 0.5920 | 0.535940 | 0.0066 | 0.0123 | 392376 | NaN |
9980952 | rs147475742 | 22:50781276:G:A:b38 | 22 | 50781276 | A | G | 0.035922 | 0.6228 | 0.491886 | 0.0076 | 0.0154 | 381579 | NaN |
9980953 | rs115055839 | 22:50783303:T:C:b38 | 22 | 50783303 | C | T | 0.078641 | 0.4446 | 0.764448 | 0.0095 | 0.0124 | 385532 | NaN |
9980954 | rs114553188 | 22:50783672:G:T:b38 | 22 | 50783672 | T | G | 0.063107 | 0.9196 | -0.100938 | -0.0013 | 0.0129 | 418125 | NaN |
9980955 | rs9616985 | 22:50791377:T:C:b38 | 22 | 50791377 | C | T | 0.078641 | 0.3892 | 0.861069 | 0.0107 | 0.0124 | 384399 | NaN |
9980956 rows × 13 columns