suppressMessages(library(tidyverse))suppressMessages(library(glue))PRE ="/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"## COPY THE DATE AND SLUG fields FROM THE HEADERSLUG="ERAP2-personalized-mutagenesis"## copy the slug from the headerbDATE='2023-04-03'## copy the date from the blog's header hereDATA =glue("{PRE}/{bDATE}-{SLUG}")if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))WORK=DATA
Context
Last week I was working on implementing personalized mutagenesis updates were a bit infrequent due to lots of meetings and most work being done on the remote cluster.
A bit of formalism can be found on the 2023-03-09 update (when I first started thinking about the personalized mutagenesis issue). The idea is relatively simple. Rather than taking the reference genome and simply comparing the ref/alt at a given site to get an effect size, we are now interested in investigating the impact of a given variant
Implementation is a little tricky within our current framework. The route I’ve taken is to create a modified VCF file with “pseudoindividuals”. Each pseudoindividual represents a single “genotype” pass through enformer. In additin, for simplicity I am just looking at a few variants for mutagenesis. The total number of runs is:
---title: "ERAP2-personalized-mutagenesis"author: "Saideep Gona"date: "2023-04-03"format: html: code-fold: true code-summary: "Show the code"execute: freeze: true warning: false---```{r}#| label: Set up box storage directorysuppressMessages(library(tidyverse))suppressMessages(library(glue))PRE ="/Users/sgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"## COPY THE DATE AND SLUG fields FROM THE HEADERSLUG="ERAP2-personalized-mutagenesis"## copy the slug from the headerbDATE='2023-04-03'## copy the date from the blog's header hereDATA =glue("{PRE}/{bDATE}-{SLUG}")if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))WORK=DATA```# Context Last week I was working on implementing personalized mutagenesis updates were a bit infrequent due to lots of meetings and most work being done on the remote cluster.A bit of formalism can be found on the 2023-03-09 update (when I first started thinking about the personalized mutagenesis issue). The idea is relatively simple. Rather than taking the reference genome and simply comparing the ref/alt at a given site to get an effect size, we are now interested in investigating the impact of a given variant Implementation is a little tricky within our current framework. The route I've taken is to create a modified VCF file with "pseudoindividuals". Each pseudoindividual represents a single "genotype" pass through enformer. In additin, for simplicity I am just looking at a few variants for mutagenesis. The total number of runs is:$num_individuals*variants_of_interest*num_haplotypes*REFALT*reverse$$175*3*2*2*2 = 4200$### Code for creating custom VCF file[Link to notebook](./personalized_mutagenesis_ERAP2_notebook.html)