Skip to main content

Program in Quantitative Genomics

The Program in Quantitative Genomics (PQG) develops and applies quantitative methods to help handle massive genetic, genomic, and health data. Based in the Harvard Chan School and Longwood Medical Area, its goal is to improve health through the interdisciplinary study of genetics, behavior, environment, and health. 

Location

255 Huntington Ave
Building 2, 4th floor
Boston, MA 02115

PQG Working Group

Each year, the PQG organizes a less formal PQG Working Group for all local students, postdocs, and faculty. The goal is to provide the opportunity to present and participate in the discussion of works-in-progress, and to focus on the methods and analysis of high-dimensional data in genetics and genomics.

2024/2025 Student and Postdoc Seminar organizers: Tony Chen, Kerner Gaspard, Xinan Wang

Please direct any logistical questions to Amanda King

Note: Harvard Chan School seeks to bring in speakers with a wide range of experiences and perspectives. They’re here to share their own insights; they do not speak for the school or the university.

Upcoming Seminar

Tuesday, April 29, 2025
1:00-2:00 PM

Ryan Collins, PhD
Instructor in Medical Oncology
Dana-Farber Cancer Institute

Germline genomes as a lens for understanding tumorigenesis in early-onset cancers

Many cancers are caused in part by inherited (i.e., germline) genetics.
Germline genomes are therefore a crucial aspect in understanding cancer onset and clinical outcomes in individual patients and populations alike, yet germline testing in medical oncology is limited to known familial cancer genes (e.g., BRCA1/2) despite each genome harboring ~5M germline variants spanning a vast range of sizes (SNVs, indels, SVs), frequencies (rare vs. common), and contexts (protein-coding vs. noncoding). In my seminar, I will present results from recently completed studies of germline SVs in pediatric extracranial solid tumors (total N=9,373 individuals) and early-onset lung cancer (total N=2,358 individuals), which have underscored that subsets of rare germline SVs can contribute profound risk for cancer and that these SVs can act through both gene-disruptive and strictly noncoding molecular mechanisms. I will also discuss insights from these two studies that suggest a possible role for rare germline SVs in somatic genome instability, particularly in the context of early-onset cancers. Finally, I will share preliminary results from ongoing work mapping the full spectrum of germline variants via genome sequencing in unexplained familial cancer cases and cancer-free controls (total N=4,769 individuals) and a very large data aggregation effort to produce a gnomAD-like resource tailed to medical oncology and cancer research (total N=55k individuals). Collectively, we hope that these efforts will expand our understanding of genetic risk for cancer and lay the groundwork for translational advances into better risk stratification, screening, and early cancer detection.

2024-2025 Dates

Phillip Nicol

PhD Candidate
Harvard T.H. Chan School of Public Health

 Identifying spatially variable genes by projecting to morphologically relevant directions

Spatial transcriptomics allows for high-resolution sequencing while retaining two-dimensional sample coordinates. A common goal is to identify spatially variable genes within a predefined cell type or domain. However, in many cases this region is implicitly one-dimensional, and consequently standard two-dimensional coordinate-based methods may lack statistical power and precision as they ignore tissue organization. In this talk, we introduce a spectral approach to find the optimal one-dimensional curve approximating the spatial transcriptomics sample coordinates. We then leverage this curve to define a new coordinate system that better represents the tissue morphology. A generalized additive model (GAM) is developed to pinpoint genes exhibiting variable expression in this new coordinate system. Our method directly models gene counts, eliminating the need for normalization or preprocessing steps. Our results indicate superior performance compared to existing hypothesis tests for identifying spatially variable genes, while also accurately pinpointing the precise location and mode of relevant expression patterns. We validate our approach through comprehensive simulations and real data analysis, encompassing diverse platforms such as Visium, Slide-seq, and MERFISH.

Aoxing Liu

Postdoctoral Fellow
MGH and Broad Institute

Genetic drivers and cellular selection of female mosaic X chromosome loss

Mosaic loss of the X chromosome (mLOX) is the most commonly occurring clonal somatic alteration detected in the leukocytes of women, yet little is known about its genetic determinants or phenotypic consequences. To address this, we estimated mLOX in > 880,000 women across eight biobanks, identifying 12% of women with detectable X loss in approximately 2% of their leukocytes. Out of 1,253 diseases examined, women with mLOX had an elevated risk of myeloid and lymphoid leukemias. Genetic analyses identified 56 common variants influencing mLOX, implicating genes with established roles in chromosomal missegregation, cancer predisposition, and autoimmune diseases. A small fraction of these associations were shared with mosaic Y chromosome loss in men, suggesting different biological processes drive the formation and clonal expansion of sex chromosome missegregation events. Allelic shift analyses identified alleles on the X chromosome which are preferentially retained, demonstrating that variation at many loci across the X chromosome is under cellular selection. A novel polygenic score including 44 independent X chromosome allelic shift loci correctly inferred the retained X chromosomes in 80.7% of mLOX cases in the top decile. Collectively our results support a model where germline variants predispose women to acquiring mLOX, with the allelic content of the X chromosome possibly shaping the magnitude of subsequent clonal expansion.

Saori Sakaue

Instructor, MD/PhD,

Divisions of Genetics and Rheumatology, Brigham and Women’s Hospital

Genetic determinants of RNA expression are critical for understanding disease mechanisms. However, conventional expression quantitative loci (eQTL) using bulk RNA-seq lacks mRNA lifecycle details. While eQTL can affect transcription by promoters or enhancers, they may also impact posttranscriptional modifications impacting RNA stability. To address this, we compared eQTL from matured cell RNA with nascent nucleus RNA. We used (i) bulk RNA-seq and single-nucleus (sn)RNA-seq from brain and (ii) single-cell (sc)RNA-seq and snRNA-seq from kidney. Using fine-mapped causal probability, cell RNA eQTL variants in the brain were significantly enriched in transcribed regions (P=4.0×10-145) and RBP binding site (P=1.2×10-51), indicating regulation at the posttranscriptional modification level. This enrichment was replicated in an independent kidney dataset. Conversely, nucleus eQTL were enriched in distant cCREs, suggesting regulation at the level of transcription whose effect may be diluted once RNAs are exported outside of the nuclei.

We identified eQTL by stop-gain variants causing nonsense-mediated decay only in cell RNA, and causal variants in distant enhancers in nucleus RNA but in transcribed regions in cell RNA. Interestingly, there were examples of multiple (as many as 18) eQTL causal variants in linkage disequilibrium (LD), all in the transcribed regions and RBP binding sites, potentially affecting stability of mature RNA molecules synergistically. These examples potentially suggest a novel concept of multiple causal variant hypothesis in eQTLs, in contrast to the conventional hypothesis where conditionally independent variants act on distinct molecular mechanisms (e.g., promoter and enhancer effects). Indeed, we found that eQTL variants in the transcribed regions have more variants tagged by LD than those in the promoters or enhancers. Overall, cellular and nucleus RNA eQTL revealed distinct genetic determinants of expression, even within the same cell type and tissue.

Yosuke Tanigawa
Research Scientist, PhD
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology

Enhancing polygenic prediction with flexible models on individual-level data

Accurate prediction of disease liability and medically relevant traits using genetic, demographic, and environmental factors is critical for advancing precision medicine. Polygenic score (PGS), a statistical approach to aggregate the genetic effects across multiple genetic variants, attracts substantial research interest given its improved predictive accuracy and potential medical relevance. However, the limited transferability of PGS across diverse genetic ancestry groups remains the key challenge. To address the challenges of limited transferability, I hypothesize that flexible predictive models applied directly to individual-level data offer unique opportunities. I will present a series of recently developed models: (1) promoting ancestry diversity and inclusion of admixed individuals, (2) leveraging biological knowledge in variable selection, (3) modeling nonlinear genetic dominance effects, and (4) exploring gene-by-sex interactions. The proposed methods leverage supervised learning applied directly to individual-level data, enabling the modeling of nonlinear genetic effects, which is not feasible with conventional PGS approaches that rely on univariate linear association summary statistics. Finally, I will discuss the challenges and opportunities in leveraging large-scale cohort data to construct flexible and equitable polygenic predictors. Overall, these methods pave the way toward equitable genomic predictions, advancing precision medicine for diverse global populations.

Kun-Hsing Yu
Assistant Professor of Biomedical Informatics, Harvard Medical School
Assistant Professor of Pathology, Brigham and Women’s Hospital
Instructor in Epidemiology, Harvard T.H. Chan School of Public Health

Enhancing Quantitative Pathology with Generalizable Foundation AI Models

Artificial intelligence (AI) is transforming the landscape of cancer research and clinical diagnosis. Recent advances in microscopic image digitization, multi-modal machine learning algorithms, and scalable computing infrastructure have paved the way for AI-enhanced pathology assessments. In this talk, I will highlight recent breakthroughs in pathology foundation models and their effectiveness in analyzing high-resolution digital pathology images. In addition, I will present examples of AI-empowered real-time pathology evaluations during cancer surgery and demonstrate their adaptability to evolving diagnostic classifications. Furthermore, I will discuss recent studies that employed AI to reveal intriguing links between cell morphology and molecular profiles. Finally, I will outline ongoing challenges in developing robust medical AI systems and identify research directions to address these critical issues.

Jia-Ren (Jerry) Lin, PhD
Technical Director of Tissue Imaging,
Laboratory of Systems Pharmacology, Harvard Medical

Advanced solid tumors display complex interactions between tumor, immune, and stromal cells, leading to high spatial heterogeneity. To dissect these dynamics in colorectal cancer, we use multiplexed tissue imaging, 3D reconstruction, spatial statistics, and machine learning to identify key cellular states and transitions, molecular gradients, and features predictive of clinical outcomes. Notably, at the invasive margin, diverse cell types contribute to T cell suppression, and 3D imaging reveals structured tertiary lymphoid regions with graded molecular profiles. To enhance diagnostic and prognostic precision, we apply the Orion platform to collect co-registered H&E and high-plex immunofluorescence images. In a study of 74 colorectal cancer resections, we demonstrate that combining spatially resolved immunofluorescence with conventional histology improves predictive modeling of progression-free survival. This multimodal approach yields interpretable, high-performance biomarkers and illustrates the promise of integrated imaging for precision oncology.

3D Multiplexed and Multimodal Imaging of Human Tumors: Unlocking Prime-Time Opportunities for Digital Pathology and Precision Medicinewith Generalizable Foundation AI Models