CS 478 — Assignments > Permutation Tests

CS478 Paired Permutation Test Overview

We would like to compare the performance of models M₁ and M₂. One way to do this is to form a (null) hypothesis such as "there is no difference in the accuracies of M₁ and M₂", or more precisely, "on a random set of data, (M₁, M₂) are as likely to have generalization accuracies (a,b) as accuracies (b,a), for any a and b. To test such a hypothesis, we can select a representative statistic, such as "the mean difference in accuracies" and try to estimate it at some level of confidence. This will tell us something about whether or not we should reject our hypothesis. For example, if we have a high confidence that the mean difference is not 0, then we can reject our hypothesis and say that the accuracy of M₁ is significantly higher (or lower) that that of M₂. One way to do this is to use a paired permutation test as follows:

Obtain a set of k pairs of accuracy estimates {(a₁, b₁), (a₂, b₂), ..., a_k} for (M₁, M₂). Make sure that estimates a_i and b_i are obtained from the same data and that estimates a_i and a_j are obtained from different data for i&nej (this is typically done using N-fold [stratified] cross-validation).
Calculate the average difference in accuracies, μ_diff = Σ_i(a_i-b_i)/k
Let n = 0
For each possible permutation of swapping a_i and b_i (perhaps most easily computed by permuting the signs of the k differences):
1. Calculate the average difference in accuracies, μ_new = Σ_i(a_i-b_i)/k
2. If |μ_new| ≥ |μ_diff|
  n=n+1
Report p=n/2^k (smaller p means we are more likely to reject the null hypothesis, or, put another way, that we are more confident that our observed value, μ_diff, is statistically significantly different than 0).

Note: in the general case, k can be too large to admit an exhaustive test of all 2^k permutations. In these situations, the most common approach is the obvious one of randomly sampling the permutations.

Thanks to Jason Eisner of the Johns Hopkins Computer Science department for many helpful suggestions.

CS 478: Introduction to Machine Learning

http://axon.cs.byu.edu/Dan/478/

CS478 Paired Permutation Test Overview