erap2-fine-mapping

Author

Saideep Gona

Published

March 28, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="erap2-fine-mapping" ## copy the slug from the header
bDATE='2023-03-28' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Context

Am going to refocus on ERAP2 again for a few reasons:

1.) It is interesting

2.) I didn’t reevaluate the results after fixing some pipeline bugs

3.) It is a chance to implement personalized mutagenesis and compare with reference mutagenesis

Replotting of ERAP2 association

Code

library(tidyverse)
library(ggplot2)
dset_dir <- file.path("/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/analysis-Sai/enformer_geuvadis/enformer_portability_analysis")

geuvadis_full <- read_delim(file.path(dset_dir,"GD462.GeneQuantRPKM.50FN.samplename.resk10.txt"))
geuvadis_genes_symbols <- sapply(str_split(geuvadis_full$Gene_Symbol, "\\."),`[`,1)
geuvadis <- as.data.frame(geuvadis_full[,5:ncol(geuvadis_full)])
rownames(geuvadis) <- geuvadis_genes_symbols

enformer_haplo1 <- read.csv2(file.path(dset_dir,"CAGE_lcl_enformer_geuvadis_summed_haplo1_filtered.csv"), header = TRUE, row.names = 1)

enformer_haplo2 <- read.csv2(file.path(dset_dir,"CAGE_lcl_enformer_geuvadis_summed_haplo2_filtered.csv"), header = TRUE, row.names = 1)

enf_genes_symbols <- sapply(str_split(rownames(enformer_haplo1), "\\."),`[`,1)
enformer <- (enformer_haplo1 + enformer_haplo2)
rownames(enformer) <- enf_genes_symbols
ge_inds <- intersect(colnames(geuvadis), colnames(enformer))
enformer <- enformer[,ge_inds]
geuvadis <- geuvadis[,ge_inds]
ERAP2_df <- data.frame(enformer=as.numeric(enformer["ENSG00000164308",]), geuvadis=as.numeric(geuvadis["ENSG00000164308",]))


ggplot(ERAP2_df) + geom_point(aes(x=geuvadis,y=enformer))

Great, the negative correlation clearly still holds.

Now we want to perform two types of mutagenesis for the ERAP2 region. The first is standard reference-based mutation, where each ERAP2 mutant of interest is mutated against a background of reference sequence. The second is individulized mutagenesis in which the mutations are done with different individualized background context.

The generation of the individualized mutagenesis is done in an ipython notebook on grand. Will document the contents next week.

--- title: "erap2-fine-mapping" author: "Saideep Gona" date: "2023-03-28" format: html: code-fold: true code-summary: "Show the code" execute: freeze: true warning: false --- ```{r} #| label: Set up box storage directory suppressMessages(library(tidyverse)) suppressMessages(library(glue)) PRE = "/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai" ## COPY THE DATE AND SLUG fields FROM THE HEADER SLUG="erap2-fine-mapping" ## copy the slug from the header bDATE='2023-03-28' ## copy the date from the blog's header here DATA = glue("{PRE}/{bDATE}-{SLUG}") if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}")) WORK=DATA ``` # Context Am going to refocus on ERAP2 again for a few reasons: 1.) It is interesting 2.) I didn't reevaluate the results after fixing some pipeline bugs 3.) It is a chance to implement personalized mutagenesis and compare with reference mutagenesis ### Replotting of ERAP2 association ```{r} library(tidyverse) library(ggplot2) dset_dir <- file.path("/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/analysis-Sai/enformer_geuvadis/enformer_portability_analysis") geuvadis_full <- read_delim(file.path(dset_dir,"GD462.GeneQuantRPKM.50FN.samplename.resk10.txt")) geuvadis_genes_symbols <- sapply(str_split(geuvadis_full$Gene_Symbol, "\\."),`[`,1) geuvadis <- as.data.frame(geuvadis_full[,5:ncol(geuvadis_full)]) rownames(geuvadis) <- geuvadis_genes_symbols enformer_haplo1 <- read.csv2(file.path(dset_dir,"CAGE_lcl_enformer_geuvadis_summed_haplo1_filtered.csv"), header = TRUE, row.names = 1) enformer_haplo2 <- read.csv2(file.path(dset_dir,"CAGE_lcl_enformer_geuvadis_summed_haplo2_filtered.csv"), header = TRUE, row.names = 1) enf_genes_symbols <- sapply(str_split(rownames(enformer_haplo1), "\\."),`[`,1) enformer <- (enformer_haplo1 + enformer_haplo2) rownames(enformer) <- enf_genes_symbols ge_inds <- intersect(colnames(geuvadis), colnames(enformer)) enformer <- enformer[,ge_inds] geuvadis <- geuvadis[,ge_inds] ERAP2_df <- data.frame(enformer=as.numeric(enformer["ENSG00000164308",]), geuvadis=as.numeric(geuvadis["ENSG00000164308",])) ggplot(ERAP2_df) + geom_point(aes(x=geuvadis,y=enformer)) ``` Great, the negative correlation clearly still holds. Now we want to perform two types of mutagenesis for the ERAP2 region. The first is standard reference-based mutation, where each ERAP2 mutant of interest is mutated against a background of reference sequence. The second is individulized mutagenesis in which the mutations are done with different individualized background context. The generation of the individualized mutagenesis is done in an ipython notebook on grand. Will document the contents next week.