Author

Saideep Gona

Published

March 28, 2023

Code
suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="erap2-fine-mapping" ## copy the slug from the header
bDATE='2023-03-28' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Context

Am going to refocus on ERAP2 again for a few reasons:

1.) It is interesting

2.) I didn’t reevaluate the results after fixing some pipeline bugs

3.) It is a chance to implement personalized mutagenesis and compare with reference mutagenesis

Replotting of ERAP2 association

Code
library(tidyverse)
library(ggplot2)
dset_dir <- file.path("/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/analysis-Sai/enformer_geuvadis/enformer_portability_analysis")

geuvadis_full <- read_delim(file.path(dset_dir,"GD462.GeneQuantRPKM.50FN.samplename.resk10.txt"))
geuvadis_genes_symbols <- sapply(str_split(geuvadis_full$Gene_Symbol, "\\."),`[`,1)
geuvadis <- as.data.frame(geuvadis_full[,5:ncol(geuvadis_full)])
rownames(geuvadis) <- geuvadis_genes_symbols

enformer_haplo1 <- read.csv2(file.path(dset_dir,"CAGE_lcl_enformer_geuvadis_summed_haplo1_filtered.csv"), header = TRUE, row.names = 1)

enformer_haplo2 <- read.csv2(file.path(dset_dir,"CAGE_lcl_enformer_geuvadis_summed_haplo2_filtered.csv"), header = TRUE, row.names = 1)

enf_genes_symbols <- sapply(str_split(rownames(enformer_haplo1), "\\."),`[`,1)
enformer <- (enformer_haplo1 + enformer_haplo2)
rownames(enformer) <- enf_genes_symbols
ge_inds <- intersect(colnames(geuvadis), colnames(enformer))
enformer <- enformer[,ge_inds]
geuvadis <- geuvadis[,ge_inds]
ERAP2_df <- data.frame(enformer=as.numeric(enformer["ENSG00000164308",]), geuvadis=as.numeric(geuvadis["ENSG00000164308",]))


ggplot(ERAP2_df) + geom_point(aes(x=geuvadis,y=enformer))

Great, the negative correlation clearly still holds.

Now we want to perform two types of mutagenesis for the ERAP2 region. The first is standard reference-based mutation, where each ERAP2 mutant of interest is mutated against a background of reference sequence. The second is individulized mutagenesis in which the mutations are done with different individualized background context.

The generation of the individualized mutagenesis is done in an ipython notebook on grand. Will document the contents next week.