Skip to main content

Poster Session 2025

Kernel Regression with Tree-Exploring Aggregations

Presented By: Sithija Manage

The proliferation of high-throughput sequencing technologies has generated vast quantities of gut microbiome data, creating an acute need for statistically sound analytical tools. We introduce Kernel Regression with Tree-Exploring Aggregations (KR TEXAS), a novel multivariate nonparametric kernel regression estimator designed to address key challenges in microbiome data analysis: compositionality, zero-inflation, and appropriate taxonomic aggregation. Unlike traditional approaches that require uniform taxonomic aggregation levels (e.g., genus or species), KR TEXAS autonomously learns optimal aggregation levels across different branches of the phylogenetic tree based on each feature’s predictive importance. The estimator employs L1-penalized multivariate Nadaraya-Watson regression with a specifically parameterized distance metric to assign importance coefficients to aggregated features, effectively handling the high sparsity ($>$70\% zeros) common in microbiome datasets. We demonstrate KR TEXAS’s performance through theoretical guarantees, numerical experiments and an application analyzing the relationship between gut microbiome composition and child iron levels from a randomized control trial of biofortified pearl millet in Mumbai. Our approach provides researchers with a flexible, data-driven method for feature aggregation that respects the hierarchical structure of microbial communities while identifying functionally significant taxa at various taxonomic levels.