Assignment Specifications (View All)

WEKA Lab 2

This assignment is worth 17% of your overall Homework and Programming Assignments grade.

  1. In this lab you will investigate the difference in model performance using statistical significance testing. We will compare three models (decision tree, multi-layer perceptron and SVM) on two different data sets (ionosphere and vowel), and perform a pairwise comparison of the models on each data set (so, a total of six experiments). For the vowel data, preprocess the data by removing the first three attributes as we did in the previous assignment.
  2. For both data sets, compare the performance of a decision tree vs. a multi-layer perceptron, a decision tree vs. a support vector machine, and a multi-layer perceptron vs. a support vector machine using a paired permutation test. For each model, describe parameter settings/design decisions you make in acquiring your data (so that your experiments are replicable).
  3. For the permutation tests, obtain 10 paired accuracy estimates (the best and easiest way to do this is 10-fold cross-validation) and perform an exhaustive permutation test.
  4. You can collect accuracy estimates using the Experimenter in WEKA, dumping the results to a file and using the appropriate column of the file. You will need to implement the paired permutation test yourself.
  5. Report the p value for each experiment and draw conclusions. Can you make any assertions about the accuracy of one model over another on the ionosphere data (that is, can you say that differences in model accuracy are statistically significant)? On the vowel data? Can you say anything about how these models might compare on other data sets? What can you say about differences in the performance of the models in general?

Turn in a thoughtful, well-written written report (see the guidelines above) that details your experiments and addresses the questions posed above (look carefully at everything to make sure you've covered all the parts of each).