The choice of a statistical hypothesis test is a challenging open problem for interpreting machine learning results. In his widely cited 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to train multiple copies of classifier models.

This describes the current situation with deep learning models that are both very large and are trained and evaluated on large datasets, often requiring days or weeks to train a single model. In this tutorial, you will discover how to use the McNemar’s statistical hypothesis test to compare machine learning classifier models on a single test dataset.

After completing this tutorial, you will know: How to Calculate McNemar’s Test for Two Machine Learning ClassifiersPhoto by Mark Kao, some rights reserved. This tutorial is divided into five parts; they are: Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course. Download Your FREE Mini-Course In his important and widely cited 1998 paper on the use of statistical hypothesis tests to compare classifiers titled “Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms“, Thomas Dietterich recommends the use of the McNemar’s test.

Specifically, the test is recommended in those cases where the algorithms that are being compared can only be evaluated once, e.g. on one test set, as opposed to repeated evaluations via a resampling technique, such as k-fold cross-validation. Read more from…

thumbnail courtesy of