Assignment Specifications (View All)

WEKA Exercise

This assignment is worth 10% of your overall Homework and Programming Assignments grade.

Perform the following activities:

  1. Go to http://www.cs.waikato.ac.nz/ml/weka/. Download and install the latest version of Weka on your computer.
  2. In the Explorer application, open the CardiologyCategorical.csv data file and do the following:
    1. Build a neural network (using the MLP algorithm) that predicts whether a patient has a heart condition. Record the 10-fold cross-validation accuracy of your model as A1.
    2. Create a new attribute coarseBloodPressure, with values: Low if blood pressure is less than or equal to 120, Normal if blood pressure is greater than 120 but less than or equal to 150, and High if blood pressure is greater than 150. (You may need to be a little creative here in the use of (unsupervised) filters, possibly having to first create a copy of blood pressure with the new name and then transforming it as per the foregoing; Filters AddExpression and MathExpression may prove particularly useful).
    3. Build a neural network (using the MLP algorithm) that predicts whether a patient has a heart condition, using the new attribute coarseBloodPressure instead of the original blood pressure. Record the 10-fold cross-validation accuracy of your model as A2.
    4. Compare A1 and A2. Any comments?
    5. Remove all records whose value of the attribute resting ecg is Abnormal.
    6. Construct a decision tree (using the J48 Algorithm) that predicts whether a patient has a heart condition, given the attributes age, sex, chest pain type, coarseBloodPressure, angina, peak, and slope. Save the result buffer and insert the confusion matrix obtained with 10-fold cross-validation in your report.
  3. In the Explorer application, open the cpu.arff data file and do the following:
    1. Cluster the data (using the simple k-Means algorithm, with k=3) and report on the nature and composition of the extracted clusters.
    2. Discretize the attributes MMIN, MMAX, CACH, CHMIN and CHMAX using 3 buckets in one step. Find associations among these attributes only (i.e., remove the other ones), using the Apriori algorithm, with support 0.2, confidence 0.95 and top 3 rules only being displayed. Save the result buffer and insert it in your report.
    3. Using the original data, list the eigenvalues associated with the attributes selected by the Principal Components Analysis method, when the amount of variance you wish covered by the subset of attributes is 75%.

Turn in a thoughtful, well-written written report (see the guidelines above) that details your experiments and addresses the questions posed above (look carefully at everything to make sure you've covered all the parts of each).