ID3/C4.5 Simulation Assignment - CS 572

Using the C4.5 simulator, do the following experiments. For each experiment give a short (~paragraph) discussion of your findings and observations. Remember. This is a powerful network. You're testing and experimenting with the real thing. Have fun with it.

Note: For all assignments in this course requiring your observations (which is the typical case) the most important thing is not just mentioning what you observe occurring, but I want your explanation as to why it is occurring. If you can't figure it out, then give your best try at explaining it. This is where learning can best occur, when trying to figure out why the models do what they do.

1. Experiment with ID3 on different training/test sets (MLDB and/or synthetic). Start with simple application and then try more complicated ones. After studying a few applications, discuss the following questions.

a) How accurate is the model

b) How large a decision tree is created

c) What is the effect of noise

d) What is the effect of missing attributes

e) What kind of applications are easy/hard for ID3

f) Other observations

2. Create your own "real world" application, test it, and discuss how well ID3 handles it. You will use this same application with future simulations as a comparison point.

3. Be creative. Try some of the following and some of your own experiments:

• Experiment with C4.5 rules

• Experiment with the use of value sets

• Experiment with other C4.5 parameter settings

• Experiment with binarization of attributes

• Give graphical results of modifying noise, missing values, types of attributes, etc.

• Come up with some of your own research results!

Total pages: ~5-8

Here are some instructions on how to use C4.5 on the open lab machines.