In recent years, high throughput methodologies and databases of personal health histories have dramatically increased the number of potential discoveries scanned during a typical scientific experiment.
At the heart of our concern is that no such study can avoid some selection at the stage of analysis, discussion or summary. No person can digest the results of such an experiment without direct or indirect selection of those inferences being highlighted. Statistical inference following selection – selective inference – is thus unavoidable. Gone is the classic experiment designed specifically to answer a sharply phrased single question. And yet, most statistical methods taught and used by scientists are still those that cater to a situation no more encountered in practice.
Once applied only to the selected few, the interpretation of the usual measures of uncertainty fails: a 95% confidence interval (CI) will cover the respective true value in 19 out of 20 trials. But select the 20 most promising aspects of the quality of life for a new drug, out of the total 100 measured, and the average coverage of true values is bound to be much lower than 19/20 (Benjamini and Yekutieli, 2005).
The problem of Dwindling effects in Psychology studies, described so vividly by Schooler, may also stem from selective inference issues. The phenomenon that selection based on high estimated values tends to yield estimates that are higher than their true value is well documented, and is also known as the ‘Winner’s Curse.’ To this add the above problem of regular confidence intervals failing to cover the true values more often than expected after selection, and one is bound to produce a discovery that seems to shrink down in further studies, and even surprisingly so because it falls below its lowest confidence limit.