Recently I interviewed Clare Gollnick, CTO of Terbium Labs, on the reproducibility crisis in science and its implications for data scientists. The podcast seemed to really resonate with listeners (judging by the number of comments we’ve received via the show notes page and Twitter), for several reasons.

To sum up the issue: Many researchers in the natural and social sciences report not being able to reproduce each other’s findings. A 2016 Nature survey indicated that more than 70 percent of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.

This concerning finding has far-reaching implications for the way researchers perform scientific studies. One contributing factor to reproducibility failure, Gollnick suggests, is the idea of “p-hacking” — that is, examining your experimental data until you find patterns that meet the criteria for statistical significance before you determine a specific hypothesis about the underlying causal relationship.

P-hacking is known as “data fishing” for a reason: You’re working backward from your data to a pattern, which breaks the assumptions by which statistical significance is determined in the first place. Gollnick points out that data fishing is exactly what machine learning algorithms do, though — they work backward from data to patterns or relationships.

Data scientists can thus fall victim to the same errors made by natural scientists. P-hacking in the sciences, in particular, is similar to developing overfitted machine learning models. Read more from…

thumbnail courtesy of