Updating Existing GEUVADIS Analyses part 2

Author

Saideep Gona

Published

March 17, 2023

Code
suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="updating-existing-geuvadis-analysis-p2" ## copy the slug from the header
bDATE='2023-03-17' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Context

Today I am continuing to update my analyses with new data.

Constructing gene metadata table

I task I have put off for a while is collecting gene and individual metadata tables. These are useful context when plotting, modeling, etc. I will store these in a general location for cross-analysis accessibility: “C:\Users\Saideep\Box\imlab-data\data-Github\analysis-Sai\metadata”

Let’s start the table off with a set of protein-coding genes from a GRCh38 annotation file

Code
gene_metadata = read.delim("C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\analysis-Sai\\metadata\\protein_coding.tsv")

gene_metadata$length <- gene_metadata$end - gene_metadata$start

gene_ids <- sapply(str_split(gene_metadata$gene_id, "\\."),`[`,1)

rownames(gene_metadata) <- gene_ids

gene_metadata$gene_id <- NULL

Heritability

Festus shared with me a heritability database. I will explore it for the purpose of having gene-level heritability + pve metadata. The following is the code he shared with me:

Code
library(RSQLite)
gen_db <- "C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\analysis-Sai\\metadata\\genarch.db"
sqlite <- dbDriver("SQLite")

db = dbConnect(sqlite,gen_db)
tissues_in_table = dbGetQuery(db,"select DISTINCT tissue from results")
genes_in_table = dbGetQuery(db,"select DISTINCT ensid from results")
print(tissues_in_table)
                         tissue
1       Adipose-Subcutaneous_TS
2       Adipose-Subcutaneous_TW
3              Artery-Tibial_TS
4              Artery-Tibial_TW
5                  Cross-Tissue
6                        DGN-WB
7        Heart-LeftVentricle_TS
8        Heart-LeftVentricle_TW
9                       Lung_TS
10                      Lung_TW
11           Muscle-Skeletal_TS
12           Muscle-Skeletal_TW
13              Nerve-Tibial_TS
14              Nerve-Tibial_TW
15 Skin-SunExposed(Lowerleg)_TS
16 Skin-SunExposed(Lowerleg)_TW
17                   Thyroid_TS
18                   Thyroid_TW
19                WholeBlood_TS
20                WholeBlood_TW
Code
print(nrow(genes_in_table))
[1] 17075
Code
dbDisconnect(db)

The table has 20 tissue types in it. I am most interested in LCL cell line at the moment. There is no LCL in this dataset, so I will just use “WholeBlood_TS” as a proxy.

Code
db = dbConnect(sqlite,gen_db)
blood_gene_metadata = dbGetQuery(db,"select * from results where tissue = 'WholeBlood_TS'")
dbDisconnect(db)

gene_ids_h2 <- sapply(str_split(blood_gene_metadata$ensid, "\\."),`[`,1)
rownames(blood_gene_metadata) <- gene_ids_h2

I will now add some of this data to my existing metadata table:

Code
sub_blood_gene_metadata <- blood_gene_metadata[as.vector(gene_ids),]

gene_metadata$h2_WholeBlood_TS <- sub_blood_gene_metadata$h2

Transcription properties

Code
gene_tss_metadata <- read.delim("C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\analysis-Sai\\metadata\\genes_tss.bed", header = FALSE)


gene_ids_tss <- sapply(str_split(gene_tss_metadata$V1, "\\."),`[`,1)
gene_tss_metadata$id <- gene_ids_tss
gene_tss_metadata_nodup <- gene_tss_metadata %>% distinct(id, .keep_all = TRUE)
rownames(gene_tss_metadata_nodup) <- gene_tss_metadata_nodup$id

sub_gene_tss_metadata <- gene_tss_metadata_nodup[gene_ids,]

gene_metadata$strand <- sub_gene_tss_metadata$V3
gene_metadata$tss_sites <- sub_gene_tss_metadata$V2

rownames(gene_metadata) <- gene_ids

Now I can save the table and use it in my other analyses

Code
write.csv2(gene_metadata,file = "C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\analysis-Sai\\metadata\\gene_metadata_full.csv",row.names = TRUE)

ERAP2