Instructions for running cn2:


Here is a complete users manual for cn2 (in postscript format).

You will find the executable for cn2 in /u2/admin/cs572ta/bin. If you want, you may download the C source code or view the papers on cn2 done by Peter Clark. To create data for cn2 run xlate on your data sets with the "-a cn2" option.

I've put a utility in /pub/cs572 (anonymous ftp to axon) called fredtodis which will convert continuous valued attributes to discrete intervals. The format of the command is fredtodis data_file num_intervals For example, to convert all of the continuous valued attributes in the echocardiogram data set to 3 valued discrete variables type fredtodis echoc 3 The output file will be named data_file.dis (or in this case echoc.dis). The conversion uses the simple equal interval technique to convert the continuous attributes to discrete values. The ugly source code is also in /users/data/bin for those that want to modify it or look for bugs. UPDATE: I've fixed the bugs and tested using xlate, cn2, and c4.5 on the echo data set. The bugs fixed were as follows: Bug 1: The first line of the file always had one too many for the number of attributes. Bug 2: The boundary case where an attribute had a value equal to the the upper limit on the range for that input variable was causing the program to print out an interval which was not defined. This was fixed by including the upper limit in the final interval. Something that was not a bug: When you ask for x intervals on a continuous valued variable, the file header will define the possible values for that variable as "0, 1, ..., x-1". The "0" is for all real values which fall into the first interval, the "1" is for all real values which fall into the second interval, and the x-1 value is for all real values which fall into the xth interval. While these are numeric values, we could have labelled them with "a, b, ..." as long as we have x labels we have x intervals.