ERAP2-personalized-mutagenesis

Author

Saideep Gona

Published

April 3, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="ERAP2-personalized-mutagenesis" ## copy the slug from the header
bDATE='2023-04-03' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Context

Last week I was working on implementing personalized mutagenesis updates were a bit infrequent due to lots of meetings and most work being done on the remote cluster.

A bit of formalism can be found on the 2023-03-09 update (when I first started thinking about the personalized mutagenesis issue). The idea is relatively simple. Rather than taking the reference genome and simply comparing the ref/alt at a given site to get an effect size, we are now interested in investigating the impact of a given variant

Implementation is a little tricky within our current framework. The route I’ve taken is to create a modified VCF file with “pseudoindividuals”. Each pseudoindividual represents a single “genotype” pass through enformer. In additin, for simplicity I am just looking at a few variants for mutagenesis. The total number of runs is:

\(num_individuals*variants_of_interest*num_haplotypes*REFALT*reverse\)

\(175*3*2*2*2 = 4200\)

Code for creating custom VCF file

Link to notebook