Implementing Personalized Mutagenesis

Author

Saideep Gona

Published

March 10, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "/c/Users/Saideep/Box/imlab-data/data-Github/Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="implementing-personalized-mutagenesis" ## copy the slug from the header
bDATE='2023-03-10' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))

[1] 1

Code

WORK=DATA

Context

Yesterday I restarted the pipeline after Temi debugged it. After it ran for some time, I checked the results. The results are looking much more reasonable now (values are close to the expected range of values). However, the values themselves are not exactly identical to the ones I generated before, although they seem to correlate across individuals. Temi and I will again take a look at the input sequences to finalize our consensus.
Yesterday, I also formulated personalized mutagenesis experiments. Today I would like to implement the pipeline for running these experiments, and run a subset to help answer: “How closely does do marginal effect sizes match when conditioned on reference background vs. personalized backgrounds?”

Personalized mutagenesis implementation

The first step is to modify the existing merged pipeline so that it also outputs reverse compliment sequences. This involves writing our own function to reverse compliment one-hot-encoded sequence. Temi and I worked on this together because it is good practice at this point to make any changes to the core merged pipeline together and test extensively.

Testing reverse complement of one-hot encoding

Fortunately, the one-hot-encoding is very easy to reverse compliment due to in-built symmetry, requiring just two flips, one on each axis. The code below shows this:

The other thing to consider is that during the quantification we must use “reversed” TSS bins for quantification on the reverse complement input. This is also easily accomplished following:

$TSSBin_n = TotalBins - 1 - TSSBin_o$

In other words, we just subtract the original bin index from the total number of bins minus one to get its new index, demo’ed here:

Code

number_of_bins = 8
bins = [0]*number_of_bins
original_bin_position = 2
bins[original_bin_position] = "original bin"
print(bins)

[0, 0, 'original bin', 0, 0, 0, 0, 0]

Code

bins = [0]*number_of_bins
new_bin_position = number_of_bins - original_bin_position - 1
bins[new_bin_position] = "new bin"
print(bins)

[0, 0, 0, 0, 0, 'new bin', 0, 0]

This is now implemented in the quantification steps.

Verification of Kircher Mutagenesis (again)

Previously I had implemented a script to replicate Kircher Mutagenesis. I can now rerun this with the correct merged pipeline and with the reverse compliment sequences.

After running the pipeline, we need to be extra sure that the results are correct. A first pass of quantification showed that the reverse compliment quantifications were exceedingly low. I then doublechecked this in the following notebook, which showed that there is likely an issue with the reverse complement since the results are not mirrored. This intuition was confirmed again by replication in google colab. Those results are positive, shown here:

Temi is a bit busy, so we will fix this issue at a later time. Fortunately, the full GEUVADIS set appears to be running correctly now.

--- title: "Implementing Personalized Mutagenesis" author: "Saideep Gona" date: "2023-03-10" format: html: code-fold: true code-summary: "Show the code" execute: freeze: true warning: false --- ```{r} #| label: Set up box storage directory suppressMessages(library(tidyverse)) suppressMessages(library(glue)) PRE = "/c/Users/Saideep/Box/imlab-data/data-Github/Daily-Blog-Sai" ## COPY THE DATE AND SLUG fields FROM THE HEADER SLUG="implementing-personalized-mutagenesis" ## copy the slug from the header bDATE='2023-03-10' ## copy the date from the blog's header here DATA = glue("{PRE}/{bDATE}-{SLUG}") if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}")) WORK=DATA ``` # Context - Yesterday I restarted the pipeline after Temi debugged it. After it ran for some time, I checked the results. The results are looking much more reasonable now (values are close to the expected range of values). However, the values themselves are not exactly identical to the ones I generated before, although they seem to correlate across individuals. Temi and I will again take a look at the input sequences to finalize our consensus. - Yesterday, I also formulated personalized mutagenesis experiments. Today I would like to implement the pipeline for running these experiments, and run a subset to help answer: "How closely does do marginal effect sizes match when conditioned on reference background vs. personalized backgrounds?" ## Personalized mutagenesis implementation The first step is to modify the existing merged pipeline so that it also outputs reverse compliment sequences. This involves writing our own function to reverse compliment one-hot-encoded sequence. Temi and I worked on this together because it is good practice at this point to make any changes to the core merged pipeline together and test extensively. ### Testing reverse complement of one-hot encoding Fortunately, the one-hot-encoding is very easy to reverse compliment due to in-built symmetry, requiring just two flips, one on each axis. The code below shows this: The other thing to consider is that during the quantification we must use "reversed" TSS bins for quantification on the reverse complement input. This is also easily accomplished following: $TSSBin_n = TotalBins - 1 - TSSBin_o$ In other words, we just subtract the original bin index from the total number of bins minus one to get its new index, demo'ed here: ```{python} number_of_bins = 8 bins = [0]*number_of_bins original_bin_position = 2 bins[original_bin_position] = "original bin" print(bins) bins = [0]*number_of_bins new_bin_position = number_of_bins - original_bin_position - 1 bins[new_bin_position] = "new bin" print(bins) ``` This is now implemented in the quantification steps. ### Verification of Kircher Mutagenesis (again) Previously I had implemented a script to replicate Kircher Mutagenesis. I can now rerun this with the correct merged pipeline and with the reverse compliment sequences. After running the pipeline, we need to be extra sure that the results are correct. A first pass of quantification showed that the reverse compliment quantifications were exceedingly low. I then doublechecked this in the following [notebook](rc_validation.html), which showed that there is likely an issue with the reverse complement since the results are not mirrored. This intuition was confirmed again by replication in google colab. Those results are positive, shown here: ![colab](./colab_rc_plots.png) Temi is a bit busy, so we will fix this issue at a later time. Fortunately, the full GEUVADIS set appears to be running correctly now.