modeling_aracena_reference_enformer

Author

Saideep Gona

Published

August 29, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "/Users/saideepgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="modeling_aracena_reference_enformer" ## copy the slug from the header
bDATE='2023-08-29' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Context

The goal of creating a training dataset from Aracena et al. data is to use it for actual modeling purposes. There are a variety of ways to do this, with the longer term goal being that of fine-tuning Enformer style models. Before doing that, however, I will try simpler approaches. The first approach I will try is to take Enformer target predictions for different genomic regions and use these as feature inputs into a model predicting the corresponding Aracena targets. For maximum simplicity, I will start with reference predictions as inputs during training