You will be implementing the ID3 algorithm for inducing decision trees.
We have provided source code that reads in a Machine Learning Database
File and creates a set of examples. You must divide the example set into
a training set and testing set, build the decision tree, and determine
the accuracy of the tree on both sets.
Here are the steps necessary to complete the program:
Download the source code and a few MLDB files.
Go over the code to make sure you understand how I am storing the attributes
and examples. Note: for this assignment, all of the attributes will be
Divide the example set into a training set and testing set.
Use the training set to build the decision tree. This involves writing
a function to compute Information Gain, and then writing a simple recursive
function to build the decision tree. Stop splitting when the maximum gain
falls below some user defined threshold.
Use the decision tree to determine the accuracy by predicting the outputs
for the test set examples.
You will hand in your code and a 1-2 page writeup regarding:
Training set and test results for three of the data bases.
Short discussion on strengths and weaknesses of how ID3 generalized on
these three cases.
In order to avoid learning noise, have your ID3 learning algorithm stop
adding nodes whenever the maximium gain is less than some threshold. Report
on the effects of using some different threshold values for 2 of your test