Quantitative Issues in Cancer Research Working Group Seminar
Christian Covington, PhD Student, Department of Biostatistics, Harvard T.H. Chan School of Public Health
Statistical theory and the practice of data analysis: A brief and biased history
Abstract: This talk gives an account of the replication crisis and how different disciplines– namely applied sciences, statistics, and theoretical computer science (TCS), have developed their own research agendas in order to address it. I distinguish between two tracks in the history of methodological development: one regarding adaptivity in data analysis, the other regarding “methodological uncertainty” in model selection, data processing choices, etc.
I provide an overview of a few different methodological approaches, developed in the statistics and TCS communities, for achieving valid inference under adaptivity. Then I describe two increasingly popular frameworks developed primarily by psychologists for incorporating methodological uncertainty into a data analysis pipeline: multiverse analysis and specification curve analysis. Through examples, I explore confusion and disagreement about how these ideas ought to be used. Finally, I argue that more work is needed to understand what these methods can and can’t provide, both philosophically and statistically, and provide some preliminary ideas to this end.