14 July 2025, 11:00
Deciphering the cis-regulatory sequence code of the human genome
14 July 20252025EMBL Distinguished Visitor LectureEMBL Rome
AbstractMy lab has developed lightweight robust and interpretable deep learning models that can predict diverse biochemical profiles spanning transcription factor binding chromatin accessibility nascent transcription steady state transcription and reporter assays Our models 1 detect learn and correct cryptic experimental biases 2 reveal underlying causal sequence syntax and its... AbstractMy lab has developed lightweight, robust and interpretable deep learning models that can predict diverse biochemical profiles spanning transcription factor binding, chromatin accessibility, nascent transcription, steady state transcription and reporter assays. Our models (1) detect, learn and correct cryptic experimental biases, (2) reveal underlying causal sequence syntax and its pleiotropy across biochemical and cellular contexts, (3) encode biophysical parameters and (4) predict effects of regulatory genetic variants. Our models match or surpass the performance of massive, multi-task supervised models and self-supervised DNA language models across a battery of carefully curated benchmarks, including molecular QTLs from diverse cell contexts and ancestries, reporter assays, CRISPR-based genome editing experiments, and fine mapped GWAS variants. By systematically...
Speaker(s): Anshul Kundaje, Stanford University, USA
Place: Conf Room/Building 14
EMBL Distinguished Visitor Lecture
EMBL Rome
Additional information
Abstract
My lab has developed lightweight, robust and interpretable deep learning models that can predict diverse biochemical profiles spanning transcription factor binding, chromatin accessibility, nascent transcription, steady state transcription and reporter assays. Our models (1) detect, learn and correct cryptic experimental biases, (2) reveal underlying causal sequence syntax and its pleiotropy across biochemical and cellular contexts, (3) encode biophysical parameters and (4) predict effects of regulatory genetic variants. Our models match or surpass the performance of massive, multi-task supervised models and self-supervised DNA language models across a battery of carefully curated benchmarks, including molecular QTLs from diverse cell contexts and ancestries, reporter assays, CRISPR-based genome editing experiments, and fine mapped GWAS variants. By systematically interpreting a foundational resource of ~5000 regulatory DNA sequence models trained on bulk and single cell datasets from diverse adult and fetal cellular contexts, we can begin unraveling the incredible complexity and context-specificity of regulatory sequence lexicons, syntax and genetic variation encoded in the human genome.