Program in Quantitative Genomics
The Program in Quantitative Genomics (PQG) develops and applies quantitative methods to help handle massive genetic, genomic, and health data. Based in the Harvard Chan School and Longwood Medical Area, its goal is to improve health through the interdisciplinary study of genetics, behavior, environment, and health.
255 Huntington Ave
Building 2, 4th floor
Boston, MA 02115
PQG Seminar
The goal of the PQG Seminar Series is to promote interaction, collaboration, and research in quantitative genomics. The series seeks to further the development and application of quantitative methods, especially for high dimensional data, as well as focus on the training of quantitative genomic scientists.
2025/2026 Seminar Organizers: Rong Ma and Junwei Lu
Please direct any logistical questions to Amanda King
Note: Harvard Chan School seeks to bring in speakers with a wide range of experiences and perspectives. They’re here to share their own insights; they do not speak for the school or the university.
All PQG seminar meetings for the semester will be held in person unless otherwise noted.
Upcoming Seminar
Tuesday, March 10, 2026
1:00 -2:00 PM
Biostats Conference Room 2-426
Omar Abudayyeh
Assistant Professor, Harvard Medical School
Investigator, Brigham and Women’s Hospital and Mass General Brigham’s Gene and Cell Therapy Institute
Jonathan Gootenberg
Assistant Professor, Harvard Medical School
Investigator at Beth Israel Deaconess Medical Center
Programmable Biology: From Molecular Tools to Virtual AI Models
The convergence of artificial intelligence and biotechnology promises to revolutionize our understanding of life itself, enabling us to decode biological complexity at unprecedented scales and accelerate the discovery of therapies for aging and disease. Recent advances in molecular biology, particularly in programmable tools, have enabled unprecedented control over biological systems, yet significant challenges remain in predictably engineering cellular and molecular components. Our work develops a transformative framework that operates across biological scales—from protein engineering to whole-cell modeling and rejuvenation. We created EVOLVEpro, a few-shot active learning platform that intelligently optimizes protein function by combining language models with targeted experimentation. At the cellular scale, we are developing virtual cell foundation models that predict cellular responses to genetic and chemical perturbations. We apply these models to systematically map aging mechanisms through a single-cell perturbation atlas, identifying novel factors that can restore youthful cell states.
Beyond individual proteins and cells, we are pioneering reinforcement learning environments that span the full hierarchy of biological organization—from molecular genetics to single cell RNA sequencing to clinical outcomes. Our central hypothesis is that structured RL frameworks can transform how AI systems learn biology, enabling them to reason across scales and develop genuine biological intuition. Rather than forcing language models to interpret raw, heterogeneous biological data directly, these environments create structured learning contexts where models can master cause-and-effect relationships, predict emergent properties, and propose novel therapeutic strategies, especially for anti-aging interventions. In parallel, we’re deploying Google’s AI co-scientist platform to systematically nominate and validate anti-aging interventions, with a particular focus on compounds that induce partial cellular reprogramming. This dual approach—combining virtual exploration with experimental validation—accelerates our ability to identify rejuvenation therapies that can restore youthful cellular function.
Our work demonstrates how machine learning can model biological complexity, generate mechanistic insights, and accelerate comprehensive platforms for rational biological engineering—ultimately advancing our understanding of biology and aging to identify new rejuvenation therapies. By combining molecular precision with systems-level understanding and AI-driven exploration, we are creating a new paradigm where biological age becomes as programmable as the genetic code itself.
2025-2026 Dates
Bo Xia
Assistant Professor, Harvard Medical School
Gene Regulation Observatory Fellow, Broad Institute
Predictive Genomics for Gene Regulation and Cell Fate Determination
How does the human genome encode gene activities to determine the thousands of cell types and functions? Genome regulation and cell fate determination are intrinsically multimodal, integrating DNA sequence, protein complexes, and their intricate interactions. Traditional experimental approaches for studying gene regulation, particularly in in vivo contexts, are frequently limited by sample availability, assay feasibility, scalability, and efficiencies. To accelerate the investigation of cell fate determination, we have built foundational multimodal genomics AI models—including C.Origami and Chromnitron—that understand the key principles of genome regulation and enable high-throughput in silico screens to accelerate discoveries. First, I will talk about C.Origami, a multimodal deep neural network that learned the rules of genome organization and thus enabled accurate prediction of chromatin interaction maps in unseen cell types. Applying an in silico screening strategy, we discovered two uncharacterized proteins that contribute to the core mechanism of chromatin domain formation. Second, I will talk about our recent development of a foundation model for studying global chromatin-associated proteins (CAPs). This multimodal deep neural network, Chromnitron, learned the key rules of how a protein binds to chromatin, such as base-resolution DNA sequence features, protein-DNA interaction features, and the chromatin’s biophysical background. Chromnitron model enables many new types of in silico investigation of CAPs in various conditions, such as characterizing unstudied CAPs, predicting the impact of non-coding variants, and discovering putative regulators of cell fate transition in unseen cell types.
Justin Tubbs
Instructor in Psychology in the Department of Psychiatry
Massachusetts General Hospital
Real-time Dynamic Updating of Polygenic Scores Improves Clinical Prediction and Utility
Polygenic risk scores (PRSs) are promising tools for advancing precision medicine. However, existing PRS construction methods rely on static summary statistics derived from genome-wide association studies (GWASs), which are often updated at lengthy intervals. With genetic data and health outcomes continuously being generated, the current PRS training and deployment paradigm is suboptimal in maximizing prediction accuracy for incoming patients in healthcare settings. We introduce real-time PRS-CS (rtPRS-CS), which enables online, dynamic refinement and standardization of PRS as each new sample is collected. Extensive simulation studies evaluate the performance of rtPRS-CS across various genetic architectures and training sample sizes. Leveraging quantitative traits from two large biobanks, we show that rtPRS-CS can integrate massive streaming data to enhance PRS prediction over time. We apply rtPRS-CS to schizophrenia cohorts across 7 Asian regions, demonstrating the clinical utility of rtPRS-CS in dynamically capturing health status changes and predicting disease risk across diverse genetic ancestries.
Assistant Professor of Computing and Data Sciences, Biomedical Engineering, Biology, and Bioinformatics
Boston University
Fundamental errors in RNA velocity arising from the omission of cell growth
The ultimate promise of single cell “RNA velocity” methods is compelling: in principle, one can piece together observed short-term changes in each cell and map long-term expression trajectories that were never directly observed. While there has been robust and ongoing articulation of limitations of existing methods and mitigation strategies, consensus RNA velocity frameworks continue to overlook a fundamental aspect of cellular dynamics: cell growth. In a growing population, biomass (including RNA and other macromolecules of the cell) is constantly accumulating. This is true too at the single cell level: biomass accumulates from the beginning of cell cycle to the end before division brings daughter cells roughly back to the same size and state. This implies that to keep up with cell growth we expect a homeostatic velocity (defined in the terms of production and degradation) that is positive, which is at odds with the conventional estimation, interpretation, and uses of velocity. Here, we investigate the consequences of omitting cell growth from the RNA velocity framework. We demonstrate systematic errors in interpretation and estimation that arise from ignoring cell growth, and show evidence for these artifacts in existing data. Finally, we point the way forward for correcting some of these issues and highlight that explicitly accounting for cell growth in the RNA velocity framework can lead to new biological insights. In particular, this view shows that cell growth rate can be a global regulator of gene inducibility, in the sense that inducing large changes in abundance is “easy” in slow growing and “hard” in fast growing cells.
Paul Blainey
Professor of Biological Engineering
Broad Institute
Teaching computers about life so they can explain it to us
Omar Abudayyeh
Assistant Professor, Harvard Medical School
Investigator, Brigham and Women’s Hospital and Mass General Brigham’s Gene and Cell Therapy Institute
Jonathan Gootenberg
Assistant Professor, Harvard Medical School
Investigator at Beth Israel Deaconess Medical Center
Programmable Biology: From Molecular Tools to Virtual AI Models
The convergence of artificial intelligence and biotechnology promises to revolutionize our understanding of life itself, enabling us to decode biological complexity at unprecedented scales and accelerate the discovery of therapies for aging and disease. Recent advances in molecular biology, particularly in programmable tools, have enabled unprecedented control over biological systems, yet significant challenges remain in predictably engineering cellular and molecular components. Our work develops a transformative framework that operates across biological scales—from protein engineering to whole-cell modeling and rejuvenation. We created EVOLVEpro, a few-shot active learning platform that intelligently optimizes protein function by combining language models with targeted experimentation. At the cellular scale, we are developing virtual cell foundation models that predict cellular responses to genetic and chemical perturbations. We apply these models to systematically map aging mechanisms through a single-cell perturbation atlas, identifying novel factors that can restore youthful cell states.
Beyond individual proteins and cells, we are pioneering reinforcement learning environments that span the full hierarchy of biological organization—from molecular genetics to single cell RNA sequencing to clinical outcomes. Our central hypothesis is that structured RL frameworks can transform how AI systems learn biology, enabling them to reason across scales and develop genuine biological intuition. Rather than forcing language models to interpret raw, heterogeneous biological data directly, these environments create structured learning contexts where models can master cause-and-effect relationships, predict emergent properties, and propose novel therapeutic strategies, especially for anti-aging interventions. In parallel, we’re deploying Google’s AI co-scientist platform to systematically nominate and validate anti-aging interventions, with a particular focus on compounds that induce partial cellular reprogramming. This dual approach—combining virtual exploration with experimental validation—accelerates our ability to identify rejuvenation therapies that can restore youthful cellular function.
Our work demonstrates how machine learning can model biological complexity, generate mechanistic insights, and accelerate comprehensive platforms for rational biological engineering—ultimately advancing our understanding of biology and aging to identify new rejuvenation therapies. By combining molecular precision with systems-level understanding and AI-driven exploration, we are creating a new paradigm where biological age becomes as programmable as the genetic code itself.