1.5.1_SPrediXcan_fine-tuned

Author

Saideep Gona

Published

November 22, 2023

Context

Having linearized the fine-tuned DL model in 1.4, we can now run SPrediXcan using this linearization to try and identify common genetics with GWAS signals.

Code
# Set up environment 

# conda env create -f /path/to/this/repo/software/conda_env.yaml
# conda activate imlabtools
Code
# Initialize submission script for SPrediXcan

with open("/beagle3/haky/users/saideep/github_repos/Daily-Blog-Sai/posts/2023-11-22-retrain_expression_predictor/1.5.2_submit_SPrediXcan.sh","w") as sub_script:
    sub_script.write('''#!/bin/bash
#SBATCH --job-name=SPrediXcan_finetune
#SBATCH --account=pi-haky
#SBATCH --partition=caslake
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --time=06:00:00
#SBATCH --mem=24G\n''')
    
    sub_script.write("module load python\n")
    sub_script.write("conda activate imlabtools\n")
Code
import os

# Set paths for predixcan
spredixcan_path = "/beagle3/haky/users/saideep/github_repos/MetaXcan/software/SPrediXcan.py"

model_db_path = "/beagle3/haky/users/saideep/projects/aracena_modeling/linearization/predixcan_RNAseq_Flu_from_HDF5_mean_log_revised/database/gtex_v7_Model_training.db"
covariance_path = "/beagle3/haky/users/saideep/projects/aracena_modeling/linearization/predixcan_RNAseq_Flu_from_HDF5_mean_log_revised/database/Covariances.txt"

gwas_folder_path = "/beagle3/haky/users/charles/project/singleXcanDL/GWAS_summary/Diagram_2022/Mahajan.NatGen2022.DIAMANTE-TA.sumstat"
gwas_file_pattern = "DIAMANTE-TA.sumstat_corrected.txt.gz"

output_dir = "/beagle3/haky/users/saideep/projects/aracena_modeling/SPrediXcan/predixcan_RNAseq_Flu_from_HDF5_mean_log_revised/"

spredixcan_com= [
    spredixcan_path,
    "--model_db_path", model_db_path,
    "--covariance", covariance_path,
    "--gwas_folder", gwas_folder_path,
    "--gwas_file_pattern", gwas_file_pattern,
    "--snp_column", "SNP",
    "--effect_allele_column", "A1",
    "--non_effect_allele_column", "A2",
    "--beta_column", "BETA",
    "--pvalue_column", "P",
    "--output_file", os.path.join(output_dir, "2TD_SPrediXcan_results.txt"),
]

with open("/beagle3/haky/users/saideep/github_repos/Daily-Blog-Sai/posts/2023-11-22-retrain_expression_predictor/1.5.2_submit_SPrediXcan.sh","a") as sub_script:
    sub_script.write(" ".join(spredixcan_com))

Reformat T2D GWAS summary statistics

SPrediXcan needs to know which SNPs from the GWAS summary stats match which SNPs from the underlying trained database. One way to do this is to make sure that the columns specified by these two parameters (–model_db_snp_key, –snp_column) match. The SNP column here refers to that in the “weight” table in the stored model DB and should match the column specified in the GWAS summary stats.

Code
import pandas as pd

chr_table = pd.read_csv("/beagle3/haky/users/charles/project/singleXcanDL/S_PrediXcan/T2D_GWAS_process/T2D_harmonized.txt.gz", sep="\t")


chr_table["panel_variant_id"] = chr_table["panel_variant_id"].str.replace("chr", "").replace("_", ":")
Code
chr_table["panel_variant_id"] = chr_table["panel_variant_id"].str.replace("chr", "")
chr_table["panel_variant_id"] = chr_table["panel_variant_id"].str.replace("_", ":")
chr_table["chromosome"] = chr_table["chromosome"].str.replace("chr", "")
chr_table
variant_id panel_variant_id chromosome position effect_allele non_effect_allele frequency pvalue zscore effect_size standard_error sample_size n_cases
0 rs12238997 1:758351:A:G:b38 1 758351 G A 0.112621 0.5806 -0.552509 -0.0060 0.0109 328697 NaN
1 rs55727773 1:770988:A:G:b38 1 770988 G A 0.516505 0.2501 -1.150106 -0.0103 0.0090 250998 NaN
2 rs12565286 1:785910:G:C:b38 1 785910 C G 0.045631 0.9179 0.103079 0.0021 0.0204 249749 NaN
3 rs4951859 1:794299:C:G:b38 1 794299 G C 0.840777 0.6551 -0.446688 -0.0040 0.0090 335141 NaN
4 rs148120343 1:794707:T:C:b38 1 794707 C T 0.050485 0.9288 -0.089355 -0.0016 0.0179 247336 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
9980951 rs9616832 22:50780959:T:C:b38 22 50780959 C T 0.080583 0.5920 0.535940 0.0066 0.0123 392376 NaN
9980952 rs147475742 22:50781276:G:A:b38 22 50781276 A G 0.035922 0.6228 0.491886 0.0076 0.0154 381579 NaN
9980953 rs115055839 22:50783303:T:C:b38 22 50783303 C T 0.078641 0.4446 0.764448 0.0095 0.0124 385532 NaN
9980954 rs114553188 22:50783672:G:T:b38 22 50783672 T G 0.063107 0.9196 -0.100938 -0.0013 0.0129 418125 NaN
9980955 rs9616985 22:50791377:T:C:b38 22 50791377 C T 0.078641 0.3892 0.861069 0.0107 0.0124 384399 NaN

9980956 rows × 13 columns

Code
chr_table.to_csv("/beagle3/haky/users/charles/project/singleXcanDL/S_PrediXcan/T2D_GWAS_process/T2D_harmonized_nochr.txt.gz", sep="\t", index=False, compression="gzip")