Home / Research Centers, Institutes, and Labs / Harvard Chan Microbiome in Public Health Center / Poster Session 2025

Poster Session 2025

Kernel Regression with Tree-Exploring Aggregations

Presented By: Sithija Manage

The proliferation of high-throughput sequencing technologies has generated vast quantities of gut microbiome data, creating an acute need for statistically sound analytical tools. We introduce Kernel Regression with Tree-Exploring Aggregations (KR TEXAS), a novel multivariate nonparametric kernel regression estimator designed to address key challenges in microbiome data analysis: compositionality, zero-inflation, and appropriate taxonomic aggregation. Unlike traditional approaches that require uniform taxonomic aggregation levels (e.g., genus or species), KR TEXAS autonomously learns optimal aggregation levels across different branches of the phylogenetic tree based on each feature’s predictive importance. The estimator employs L1-penalized multivariate Nadaraya-Watson regression with a specifically parameterized distance metric to assign importance coefficients to aggregated features, effectively handling the high sparsity ($>$70\% zeros) common in microbiome datasets. We demonstrate KR TEXAS’s performance through theoretical guarantees, numerical experiments and an application analyzing the relationship between gut microbiome composition and child iron levels from a randomized control trial of biofortified pearl millet in Mumbai. Our approach provides researchers with a flexible, data-driven method for feature aggregation that respects the hierarchical structure of microbial communities while identifying functionally significant taxa at various taxonomic levels.

Unleash your potential at Harvard Chan School.

In addition to our degree programs, we offer highly targeted executive and continuing education, directed and taught by Harvard faculty.

Degree Programs

How to Apply

Executive and Continuing Education