Assignment Specifications (View All)

WEKA Lab 1

This assignment is worth 17% of your overall Homework and Programming Assignments grade.

  1. Become familiar with the ionosphere data set and use it to perform the following experiments:
    1. Use IB1 and IBk (for k up to 9) to classify the data using cross validation. Report your results. Which value of k is best?
    2. Use PCA to try differing numbers of attributes (1-8). For each number of attributes, perform the same classification experiments that you did with the original data. Report your results (as an 8x9 matrix). Which value of k is best? How does performance change with different numbers of attributes? Which number of attributes is best? Can you explain which attributes these are?
    3. Given the best number of (PCA) attributes, choose various subsets (of the same size) of the original attributes. Perform the IBk experiments on these subsets and report and compare results with those obtained using the PCA attributes. What do you see?
    4. Show a visual representation of the decision surface with 2 attributes (derived from PCA) and k=1.
  2. Become familiar with the vowel data set and use it to perform the following experiments:
    1. Remove the first three attributes as well as the class attribute.
    2. Cluster the data using the simple k-Means algorithm, with values of k from 1 to 12. What do you see?
    3. Now add the class attribute back in and repeat the clusterings, comparing the clusterings with the class. How well does the clustering appear to correlate with class? What might this mean?
    4. Choose several different classifiers and use them to classify the data. How does their performance compare with the clustering's "performance"? Is this something you might expect?
    5. Does adding back in any of the original first three attributes have any effect on either the clustering or the classification performance?
  3. Design your own experiment, perform it and report on it:
    1. Find an interesting data set and briefly describe it. You might look here or here. Or, you can come up with your own some other way. A brief description of ARFF (a file format that WEKA knows) may be helpful.
    2. Come up with an interesting experiment (experiments) to do using your data set and briefly describe it (them).
    3. Perform your experiment(s), report your results and draw conclusions (what did you learn? what does it mean? does it make sense? what else could be done? etc).

Turn in a thoughtful, well-written written report (see the guidelines above) that details your experiments and addresses the questions posed above (look carefully at everything to make sure you've covered all the parts of each). What you say is important. Why you say it is even more important.