On-line Appendix

Introduction

This directory contains source code and experimental results reported in: This file explains what files are in this directory, what they are for, and gives additional information on some of them.

Contents

This directory contains the following files and subdirectories:

Details

results.txt

This file contains two tables summarizing the experimental results. The first table lists each database. For each database it gives the average accuracy and storage requirements for each pruning heuristic. The averages are taken from 10 trials using 10-fold cross-validation. After each value, +'s and -'s indicate whether DROP3 was significantly better or worse than the given pruning heuristic on that particular database, according to a one-tailed paired t-test. The +'s and -'s indicate the confidence level as follows: -- : DROP3 was significantly worse at a 95% or higher confidence level. - : DROP3 was significantly worse at a 90% or higher confidence level. nothing : The difference was not significant. + : DROP3 was significantly better at a 90% or higher confidence level. ++ : DROP3 was significantly better at a 95% or higher confidence level. Note that "better" means "higher" for accuracy, and "lower" for storage.

Average & SD: Under the results for each database are some summaries of performance over all of the databases. The average accuracy over all of the databases is shown for each pruning heuristic, along with the standard deviation.

#times better/worse: Next is shown a count of the number of times (i.e., databases) on which DROP3 was better (i.e., higher accuracy or lower storage requirements, in the respective columns) and worse than each other pruning heuristic. For example, under kNN, "14-17, 31-0" means that DROP3 had higher average accuracy on 14 of the databases and lower accuracy on 17, and that DROP3 had lower storage requirements on all 31 of the databases. #times sig. better/worse: Next is shown how often DROP3 was significantly better and worse (according to a one-tailed paired t-test with a confidence level of 90%) than each of the other pruning heuristics. T Prob: Next is shown the confidence level with which a one-tailed paried t-test on the results of the 31 databases indicates that DROP3 is better (if positive) or worse (if negative) than the given pruning heuristic. For example, for IB3, the t-test indicates that DROP3 is more accurate than IB3 with 98.85% confidence, and that it has lower storage requirements with 91.16% confidence (on these datasets). Negative probabilities indicate the confidence that DROP3 is worse. Values of "100.00" are of course not exactly 100%, but very close. Wilcoxon: Finally, another measure of confidence is given in the same format as the T-probability, except that this probability comes from doing a Wilcoxon signed ranks test on the results for DROP3 and each other pruning heuristic on the 31 databases, instead of doing a t-test. Due to the lookup table used for this computation, no confidences for this test above 99.50% are reported.

The second table in the file contains the same information as above, but for each database, it also includes the standard deviation of the accuracy and storage requirements over the 10 cross-validation trials for each pruning heuristic, as well as the one-tailed paired t-test confidence and Wilcoxon confidence that DROP3 is better (if positive) or worse (if negative) on that particular dataset, using the paired results from each of the 10 trials.

Note that the paired t-test confidence is used for displaying the ++ and -- symbols, and for counting the number of significant wins and losses, while the Wilcoxon test is used to determine whether DROP3 was significantly better or worse than each other pruning heuristic on the whole set of 31 databases. The "ELH" pruning heuristic is not described in the Machine Learning paper. It wasn't especially successful, and was not much different from some of the other schemes, so it was not included, but the results are available here. See drop.c to see what it does. The "MCS" column can be ignored. The MCS pruning scheme was not implemented, so this column is identical to the kNN column.


res/ - Directory containing a separate results file for each database

The "res" directory contains a ".res" results file for each of the 31 databases used in these experiments. These file contain information on each database, including name, number of instances, number of inputs, types of inputs, etc.

They also specify that the "HVDM" (Heterogeneous Value Difference Metric) was used as the distance function (see JAIR, Vol. 6, no. 1, pp. 1-33 for details). They give the old name that was originally used in my source code, which you are free to ignore. They then give the name of each pruning heuristic, and then accuracy and size results for each of 10 trials.

The training instances were randomly ordered and then divided into 10 partitions. Each trial used 9 partitions as a training set, pruned it down to a subset (except for kNN), and then used this subset to classify the remaining partition, i.e., the test set.

The reported results are the accuracy obtained on the test set and the percentage of the original training set remaining in the subset.

The average accuracy and size over the 10 trials are given at the bottom, and these are the values which are included in the results.txt summary. Also reported are the standard deviation over the 10 trials, the T value used in determining the T probability, which is the confidence that DROP3 is better (i.e., higher accuracy or lower storage requirements) than the given pruning heuristic using a one-tailed paired t-test over the results for the 10 trials. As explained above under results.txt, Positive values indicate the confidence that DROP3 is better, and negative values indicate the confidence that DROP3 is worse.

Similar results were also obtained using a one-tailed Wilcoxon Signed Ranks test.


mldb/ - Directory containing machine learning databases used in the experiments

The mldb/ directory contains the version of the UCI Machine Learning Databases that were used in our experiments. These files have been converted into a numeric (".num") format that is easy to parse, and are provided here to make it easy for other researchers to run our code on the same data to verify results. The format of the files is as follows.

The first line of a .num file contains three integers, specifying (1) the number of instances in the database, (2) the number of input variables in each instance, and (3) the number of output variables (usually just 1).

The second line contains, for each input and output variable, a number representing the number of different values that the variable can have. A "0" indicates that it is a continuous attribute. A positive value greater than 0 indicates that it is a variable with discrete, unordered (i.e., nominal or symbolic) values. For example, a value of 3 would indicate that the variable has values 0, 1, and 2, but that these values represent unordered things such as color, so that "1" isn't any "closer" to "2" than "0" is.

A negative value indicates a discrete linear value. For example, a value of -3 would indicate that the variable has values 0, 1 and 2, and "1" is "closer" to "2" than "0" is.

The rest of a .num file contains one line per training instance, with a value for each input and output variable.


drop.c - Source code for DROP and other pruning heuristics

The drop.c source code is not the exact source code used to obtain the results reported in the paper, but it should produce the same results. The code has evolved somewhat since these experiments were first run, and the current version of the code includes options for automatically deciding which distance function to use, what value of k to use, whether and what kind of distance-weighted voting to do, etc.

However, the default settings of all parameters have been set so that the program should operate the same as in the Machine Learning paper experiments. (If you note any significant differences, let me know and I'll see what has changed).

The program will need to be run once for each database. Running it without any arguments prints a (lengthy) description of the various parameters you can use. You can, for example, choose a different distance metric, set the value of k to something other than 3, choose which pruning heuristic(s) to use instead of doing all of them, do only one of the 10 trials instead of all 10, add noise to the data, etc.

For basic operation, you just need to specify the input data file. The output goes to stdout, so if you want to save the results to a file, you'll need to redirect the output, e.g.,

To compile the program, you can use cc or gcc, and you need to link in the math library, e.g.,
stats/ - Directory containing original results files and source code for

The drop.c program outputs results for each database in a text format. The program "getstats.c" in the "stats/" directory was used to parse the several ".res" results files, gather averages and statistics, and print the results to new ".res" files and to "results.txt."

This code has been included so that others could see how the t-test and Wilcoxon tests were computed. The former was taken from the "Numerical Recipes in C" book, and the latter was coded from scratch according to a description of the algorithm in Conover (1971).


Contact Information

Feel free to contact me if you have questions, comments or suggestions.

Randy Wilson
E-mail: randy@axon.cs.byu.edu
WWW: http://axon.cs.byu.edu/~randy