post-BoG-notes

Author

Saideep Gona

Published

May 13, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))
PRE = "/Users/saideepgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai"

## COPY THE DATE AND SLUG fields FROM THE HEADER
SLUG="post-BoG-notes" ## copy the slug from the header
bDATE='2023-05-13' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA

Notes from Biology of Genomes 2023

General thoughts on the conference

Overall, the conference felt very productive. There are of course certain aspects which felt more valuable than others.

The following list summarizes benefits of the conference in rough order of significance (for me):

1.) Giving perspective on the state of the field as a whole 2.) Meeting others who are heavily involved in similar deep learning topics 3.) Getting direct feedback on my poster presentation work 4.) Learning about different sub-fields 5.) Learning about different conferences 6.) Meeting new friends 7.) Eating yummy food

Perspective on the state of the field

Likely the most impactful aspect of the conference is just providing better perspective on the state of the field. It is not really possible to, via just following/googling journal articles, really get a grasp for all the actors involved and their underlying reasoning/motivations. For the most part, their progress is easy to ignore. The way in which different people/groups/regions are connected is something of a guessing game. It’s much clearer, for example, what applied researchers are actually doing and where the work we’ve been working on can fit in. This section is vague - apologies. Things just make more sense now.

Meeting other deep learning researchers

As a newer field especially, there really aren’t that many experts in applied deep learning for biology. There are many who understand deep learning, but not many have actually had time to test the methods thoroughly on real datasets in the space. It’s very helpful and time-saving to talk to these experts and get their perspectives. It’s especially efficient because they have already been immersed in the ecosystem, and know the thoughts of other experts as well. Of course, each has their own perspective and interests at heart worth taking into account. For example, Jacob Schreiber(Stanford) was at the conference and he is a postdoc in Anshul’s lab. He’s experienced in the compbio/deep learning space and presented a method for single-cell deep learning. He also has pretty pessimistic views on Enformer in particular. I am currently sending follow-up correspondence since I wasn’t able to talk to him in full during the conference. So far, it seems that he believes Enformer does not live up to it’s purported advantages, for example in incorporating distal regulatory information.

Other established DL researchers in attendance include Julien Gagneur, Peter Koo, James Zou, etc. I will summarize those I observed/interacted with:

Julien Gagneur(Technical University of Munich) has been working on DNA language models, but has a lot of experience with expression prediction and regulatory modeling as well. He is also an author on one of the preprints extensively evaluating Enformer, which is now published: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9. This is the paper in which Kircher mutagenesis was evaluated with Enformer.

Peter Koo(Cold Spring Harbor Labs) has recently published work on improved training methods for DL models in genomics, and has been operating in this space for almost a decade. There is a goldmine of information and recommendations in those papers we can consider.

James Zou(Stanford) mostly works on topics outside of out interests, but did present a graph deep learning paper. Graph neural nets are very flexible, and Graphreg is another example of one. It is worth keeping on eye on this class of models. Not to be confused with Jian Zhou (ExPecto and Sei).

I did not get a chance to meet with Anil Raj(Calico), but its worth knowing his perspective.

Direct poster feedback

I had a good amount of interest and productive discussion from several individuals at my poster presentation:

Riley J. Mangan(Duke): Riley is a PhD student in Craig Lowe’s lab at Duke primarily interested in ancient DNA/evolution type work. His recent work is application of Enformer to try and predict functional effects in Neandarthal sequence. The advantage here being that all you really have is ancient DNA sequence to work with, so there is no other way to get functional readouts. Could be a very interesting application area of our retrained models.

Dr. Meng Lin(University of Colorado): Meng Lin is from University of Colorado’s new medical campus. She presented work using data from the relatively new Colorado Center for Personalized Medicine Biobank. This biobank is not too diverse, so it also suffered from issues with PRS generalizing across populations. It was interesting to see how she quantified portability across populations, although we also have people directly at UChicago doing the same things.

Dr. David McCandlish(CSHL): David was very interested in the poster because he works on epistasis testing generally, although his current work focuses more on cis-interactions. He recommended a paper: https://elifesciences.org/articles/28629 . Which discusses rank-based methods for testing epistasis rather than my current nested modeling framework.

Dr. Avantica Lal(Genentech): I didn’t catch what she was working on at the time, but Avantica was clearly very engaged with the poster, asking a lot of good questions. I have since realized that she is an AI scientist at Genentech, and is pretty involved in academic circles.

Different Sub-fields

Did not realize how many people are working on MPRA from a variety of different perspectives. Definitely need to summarize the results of these datasets coming out, especially as useful evaluation metrics for model training performance.

Other conferences

ASHG is a big one that pops out. Did not realize how large of a conference it is. I am already aware of Probgen - but learned through some discussion that the culture is much more methodologically technical.

--- title: "post-BoG-notes" author: "Saideep Gona" date: "2023-05-13" format: html: code-fold: true code-summary: "Show the code" execute: freeze: true warning: false --- ```{r} #| label: Set up box storage directory suppressMessages(library(tidyverse)) suppressMessages(library(glue)) PRE = "/Users/saideepgona/Library/CloudStorage/Box-Box/imlab-data/data-Github/Daily-Blog-Sai" ## COPY THE DATE AND SLUG fields FROM THE HEADER SLUG="post-BoG-notes" ## copy the slug from the header bDATE='2023-05-13' ## copy the date from the blog's header here DATA = glue("{PRE}/{bDATE}-{SLUG}") if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}")) WORK=DATA ``` # Notes from Biology of Genomes 2023 ## General thoughts on the conference Overall, the conference felt very productive. There are of course certain aspects which felt more valuable than others. The following list summarizes benefits of the conference in rough order of significance (for me): 1.) Giving perspective on the state of the field as a whole 2.) Meeting others who are heavily involved in similar deep learning topics 3.) Consortium data 4.) Getting direct feedback on my poster presentation work 5.) Learning about different sub-fields 6.) Learning about different conferences 7.) Meeting new friends 8.) Eating yummy food ### Perspective on the state of the field Likely the most impactful aspect of the conference is just providing better perspective on the state of the field. It is not really possible to, via just following/googling journal articles, really get a grasp for all the actors involved and their underlying reasoning/motivations. For the most part, their progress is easy to ignore. The way in which different people/groups/regions are connected is something of a guessing game. It's much clearer, for example, what applied researchers are actually doing and where the work we've been working on can fit in. This section is vague - apologies. Things just make more sense now. ### Meeting other deep learning researchers As a newer field especially, there really aren't that many experts in applied deep learning for biology. There are many who understand deep learning, but not many have actually had time to test the methods thoroughly on real datasets in the space. It's very helpful and time-saving to talk to these experts and get their perspectives. It's especially efficient because they have already been immersed in the ecosystem, and know the thoughts of other experts as well. Of course, each has their own perspective and interests at heart worth taking into account. For example, Jacob Schreiber(Stanford) was at the conference and he is a postdoc in Anshul's lab. He's experienced in the compbio/deep learning space and presented a method for single-cell deep learning. He also has pretty pessimistic views on Enformer in particular. I am currently sending follow-up correspondence since I wasn't able to talk to him in full during the conference. So far, it seems that he believes Enformer does not live up to it's purported advantages, for example in incorporating distal regulatory information. Other established DL researchers in attendance include Julien Gagneur, Peter Koo, James Zou, etc. I will summarize those I observed/interacted with: Julien Gagneur(Technical University of Munich) has been working on DNA language models, but has a lot of experience with expression prediction and regulatory modeling as well. He is also an author on one of the preprints extensively evaluating Enformer, which is now published: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9. This is the paper in which Kircher mutagenesis was evaluated with Enformer. Peter Koo(Cold Spring Harbor Labs) has recently published work on improved training methods for DL models in genomics, and has been operating in this space for almost a decade. There is a goldmine of information and recommendations in those papers we can consider. James Zou(Stanford) mostly works on topics outside of out interests, but did present a graph deep learning paper. Graph neural nets are very flexible, and Graphreg is another example of one. It is worth keeping on eye on this class of models. Not to be confused with Jian Zhou (ExPecto and Sei). I did not get a chance to meet with Anil Raj(Calico), but its worth knowing his perspective. ### Consortium data ### Direct poster feedback I had a good amount of interest and productive discussion from several individuals at my poster presentation: Riley J. Mangan(Duke): Riley is a PhD student in Craig Lowe's lab at Duke primarily interested in ancient DNA/evolution type work. His recent work is application of Enformer to try and predict functional effects in Neandarthal sequence. The advantage here being that all you really have is ancient DNA sequence to work with, so there is no other way to get functional readouts. Could be a very interesting application area of our retrained models. Dr. Meng Lin(University of Colorado): Meng Lin is from University of Colorado's new medical campus. She presented work using data from the relatively new Colorado Center for Personalized Medicine Biobank. This biobank is not too diverse, so it also suffered from issues with PRS generalizing across populations. It was interesting to see how she quantified portability across populations, although we also have people directly at UChicago doing the same things. Dr. David McCandlish(CSHL): David was very interested in the poster because he works on epistasis testing generally, although his current work focuses more on cis-interactions. He recommended a paper: https://elifesciences.org/articles/28629 . Which discusses rank-based methods for testing epistasis rather than my current nested modeling framework. Dr. Avantica Lal(Genentech): I didn't catch what she was working on at the time, but Avantica was clearly very engaged with the poster, asking a lot of good questions. I have since realized that she is an AI scientist at Genentech, and is pretty involved in academic circles. ### Different Sub-fields Did not realize how many people are working on MPRA from a variety of different perspectives. Definitely need to summarize the results of these datasets coming out, especially as useful evaluation metrics for model training performance. ### Other conferences ASHG is a big one that pops out. Did not realize how large of a conference it is. I am already aware of Probgen - but learned through some discussion that the culture is much more methodologically technical.