To compare the performance of models *M _{1}* and

- Obtain a set of
*k*estimates of accuracy*A*= {*a*, ...,_{1}*a*} for_{k}*M*and_{1}*B*= {*b*, ...,_{1}*b*} for_{k}*M*_{2} - Calculate the average
accuracies,
*μ*= (_{A}*a*+ ... +_{1}*a*)/_{k}*k*and*μ*= (_{B}*b*+ ... +_{1}*b*)/_{k}*k* - Calculate
*d*= |_{AB}*μ*-_{A}*μ*|_{B} - let
*p*= 0 - Repeat
*n*times - let
*S*={*a*, ...,_{1}*a*,_{k}*b*, ...,_{1}*b*} (statistically best if partitions not repeated)_{k} - randomly partition
*S*into two equal sized sets,*R*and*T* - Calculate the average accuracies,
*μ*and_{R}*μ*_{T} - Calculate
*d*= |_{RT}*μ*-_{R}*μ*|_{T} - if
*d*≥_{RT}*d*then_{AB}*p*=*p*+1 *p*-value =*p/n*(Report*p*,*n*, and*p*-value)