---
title: "compare_elastic_net_EnPACT_models"
author: "Saideep Gona"
date: "2024-05-16"
format:
html:
code-fold: true
code-summary: "Show the code"
execute:
freeze: true
warning: false
---
# Context
This notebook is used to compare the number of non-zero coefficients in the elastic net models trained by EnPACT for different modalities and conditions. This gives some idea of how "complex" each modality is with respect to the Enformer predictions.
```{r setup, include=FALSE}
library (glue)
library (R.utils)
library (data.table)
library (glmnet)
library (parallel)
library (tidyverse)
library (ggplot2)
```
```{r}
en_net_path <- "/beagle3/haky/users/saideep/projects/Con_EnPACT/models/Flu_ATAC_ws8/intermediates/train_enpact/trained_enpact_eln_Flu.linear.rds"
loaded_en_net <- readRDS (en_net_path)
coefs = predict (loaded_en_net, type= "coef" )
nonzero_coefs = coefs != 0
sum (nonzero_coefs)
```
```{r}
proj_path = "/beagle3/haky/users/saideep/projects/Con_EnPACT/models"
optimal_window_sizes = list (
"H3K27ac" = 8 ,
"H3K27me3" = 64 ,
"H3K4me1" = 32 ,
"H3K4me3" = 8 ,
"ATAC" = 8 ,
"RNAseq" = 4
)
modalities = c ()
conditions = c ()
window_sizes = c ()
nonzero_count = c ()
optws_modalities = c ()
coefs_list = list ()
for (x in 1 : length (optimal_window_sizes)) {
for (condition in c ("Flu" ,"NI" )) {
for (ws in c (2 ,4 ,8 ,16 ,32 ,64 ,128 )) {
modality = names (optimal_window_sizes)[x]
if (modality == "RNAseq" ) {
path_to_en_net = glue ("{proj_path}/{condition}_ws{ws}_ns_3_7_lnormFALSE_beforeFALSE_scale1_sourcekallisto/intermediates/train_enpact/trained_enpact_eln_{condition}.linear.rds" )
} else {
path_to_en_net = glue ("{proj_path}/{condition}_{modality}_ws{ws}/intermediates/train_enpact/trained_enpact_eln_{condition}.linear.rds" )
}
cur_en_net <- readRDS (path_to_en_net)
coefs = predict (cur_en_net, type= "coef" )
coefs_list[[glue ("{modality}_{condition}_{ws}" )]] = coefs
nonzero_coefs = coefs != 0
modalities = c (modalities, modality)
window_sizes = c (window_sizes, as.character (ws))
conditions = c (conditions, condition)
nonzero_count = c (nonzero_count, sum (nonzero_coefs))
optws_modalities = c (optws_modalities, glue (modality,"_" , as.character (optimal_window_sizes[modality])))
}
}
}
en_summary_df <- data.frame (modalities, conditions, nonzero_count, window_sizes, optws_modalities)
en_summary_df$ window_sizes = factor (en_summary_df$ window_sizes, levels= c ("2" ,"4" ,"8" ,"16" ,"32" ,"64" ,"128" ))
```
```{r}
en_summary_df
```
```{r}
ggplot (en_summary_df) + geom_bar (aes (x= modalities, y= nonzero_count, fill= conditions), stat= "identity" , position= "dodge" ) + theme_minimal () + theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
```
Interestingly, RNAseq seems the "least" complex by this metric.
```{r}
ggplot (en_summary_df[en_summary_df$ condition == "Flu" ,]) + geom_bar (aes (x= optws_modalities, y= nonzero_count, fill= window_sizes), stat= "identity" , position= "dodge" ) + theme_minimal () + theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
ggplot (en_summary_df[en_summary_df$ condition == "NI" ,]) + geom_bar (aes (x= optws_modalities, y= nonzero_count, fill= window_sizes), stat= "identity" , position= "dodge" ) + theme_minimal () + theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
```
### What do the coefficients tend to look like?
```{r}
```
### Let us examine the personalized prediction accuracy of the tracks the EnPACT models utilize.