Updating Existing GEUVADIS Analyses part 3

Author

Saideep Gona

Published

March 20, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="updating-existing-geuvadis-analysis-p3" ## copy the slug from the header
bDATE='2023-03-20' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Context

I still need to update my epistasis analysis with the new data. After doing so, the results are roughly as follows (full analysis is in the analysis repo).

Epistasis Results

Enrichment of epistatic signal

additive interaction

Like before, there is enrichment in both the additive and interaction test.

GSEA results

This first plot shows the number of genes in background (significant for the additive test) and the number of genes in the enrichment set (significant for both additive and interaction tests) at varying thresholds.

The second plot shows GSEA results using the hallmark genesets and fgsea for the above gene sets. Like before, the results are null. This being said, I know Karl Tayeb in Mathew Stephens lab has been working on a different method for gene set enrichment which might be interesting. I am in contact with home to try and use his method.

In addition, the ribosomal genes I observed previously are no longer significant, so they may have been a relic of bugs I have since fixed.

Sasse et al. Preprint (https://www.biorxiv.org/content/10.1101/2023.03.16.532969v1.full.pdf)

It seems like others were also trying to understand why Enformer predicts negative predictions. Interestingly, they ended up going in a similar direction of looking at the mutagenesis profile for certain genes vs. the known causal variants. In this case, they also looked at the gradient effect (which input sites have the strongest impact on the outcome), and found that often times for the negative correlations Enformer is just misattributing the effect to different variants than the causal. It especially seems attracted to variants closer to promoter sites, which is consistent with this preprint: https://www.biorxiv.org/content/10.1101/2022.09.15.508087v1.full.

This approach is very similar to the direction I was planning to test in from the post “2023-03-10-implementing-personalized-mutagenesis”. There, I planned to do “personalized mutagenesis” to determine which variants are contributing to the personalized variation. I think we should also store the “contribution scores” from our runs for this purpose. This exploration can still compliment the Sasse et al. results by:

Confirming/extending in a different population + cell type + gene
Clarifying and quantifying the difference between In-silico Mutagenesis (ISM) and Personalized In-silico Mutagenesis (PISM)

Hackathon Prep

I watched the Andrej Karpathy tutorial video today, can start helping Temi prepare for the Hackathon tomorrow.

--- title: "Updating Existing GEUVADIS Analyses part 3" author: "Saideep Gona" date: "2023-03-20" format: html: code-fold: true code-summary: "Show the code" execute: freeze: true warning: false --- ```{r} #| label: Set up box storage directory suppressMessages(library(tidyverse)) suppressMessages(library(glue)) PRE = "C:\\Users\\Saideep\\Box\\imlab-data\\data-Github\\Daily-Blog-Sai" ## COPY THE DATE AND SLUG fields FROM THE HEADER SLUG="updating-existing-geuvadis-analysis-p3" ## copy the slug from the header bDATE='2023-03-20' ## copy the date from the blog's header here DATA = glue("{PRE}/{bDATE}-{SLUG}") if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}")) WORK=DATA ``` # Context I still need to update my epistasis analysis with the new data. After doing so, the results are roughly as follows (full analysis is in the analysis repo). ## Epistasis Results ### Enrichment of epistatic signal ![additive](fig-qqplots-1.png) ![interaction](fig-qqplots-2.png) Like before, there is enrichment in both the additive and interaction test. ### GSEA results ![thesholds](plots-1.png) This first plot shows the number of genes in background (significant for the additive test) and the number of genes in the enrichment set (significant for both additive and interaction tests) at varying thresholds. ![gsea](fig-qqplots-2.png) The second plot shows GSEA results using the hallmark genesets and fgsea for the above gene sets. Like before, the results are null. This being said, I know Karl Tayeb in Mathew Stephens lab has been working on a different method for gene set enrichment which might be interesting. I am in contact with home to try and use his method. In addition, the ribosomal genes I observed previously are no longer significant, so they may have been a relic of bugs I have since fixed. ## Sasse et al. Preprint (https://www.biorxiv.org/content/10.1101/2023.03.16.532969v1.full.pdf) It seems like others were also trying to understand why Enformer predicts negative predictions. Interestingly, they ended up going in a similar direction of looking at the mutagenesis profile for certain genes vs. the known causal variants. In this case, they also looked at the gradient effect (which input sites have the strongest impact on the outcome), and found that often times for the negative correlations Enformer is just misattributing the effect to different variants than the causal. It especially seems attracted to variants closer to promoter sites, which is consistent with this preprint: https://www.biorxiv.org/content/10.1101/2022.09.15.508087v1.full. This approach is very similar to the direction I was planning to test in from the post "2023-03-10-implementing-personalized-mutagenesis". There, I planned to do "personalized mutagenesis" to determine which variants are contributing to the personalized variation. I think we should also store the "contribution scores" from our runs for this purpose. This exploration can still compliment the Sasse et al. results by: * Confirming/extending in a different population + cell type + gene * Clarifying and quantifying the difference between In-silico Mutagenesis (ISM) and Personalized In-silico Mutagenesis (PISM) ## Hackathon Prep I watched the Andrej Karpathy tutorial video today, can start helping Temi prepare for the Hackathon tomorrow.