ID3 Assignment


Overview:

You will be implementing the ID3 algorithm for inducing decision trees. We have provided source code that reads in a Machine Learning Database File and creates a set of examples. You must divide the example set into a training set and testing set, build the decision tree, and determine the accuracy of the tree on both sets.

Programming Steps:

Here are the steps necessary to complete the program:
  • Download the source code and a few MLDB files.
  • Go over the code to make sure you understand how I am storing the attributes and examples. Note: for this assignment, all of the attributes will be discrete valued.
  • Divide the example set into a training set and testing set.
  • Use the training set to build the decision tree. This involves writing a function to compute Information Gain, and then writing a simple recursive function to build the decision tree. Stop splitting when the maximum gain falls below some user defined threshold.
  • Use the decision tree to determine the accuracy by predicting the outputs for the test set examples.
  • Files:

    Assignment:

    You will hand in your code and a 1-2 page writeup regarding:
  • Training set and test results for three of the data bases.
  • Short discussion on strengths and weaknesses of how ID3 generalized on these three cases.
  • In order to avoid learning noise, have your ID3 learning algorithm stop adding nodes whenever the maximium gain is less than some threshold. Report on the effects of using some different threshold values for 2 of your test databases.

  •