Instructions for running C4.5


Everything you need is available on the open lab machines. Go to /u2/admin/cs572ta. In this directory you will see the following subdirectories.

  • DB -- This is the database of applications.
  • bin -- This contains all the executables you need.
  • man1 -- This contains the man-pages for the algorithms.
  • Don't worry about the others for now.

    In the DB directory you will find many subdirectories. Each one is a separate application. These applications are taken from the UC Irvine database. You will need to copy some of these down to your working directory and then use 'gunzip' to uncompress them. You can browse the UCI Repository to find the applications most interesting to you. The file without an extension is the data in 'Zarndt' format. The file with the .names extension is information about the application. You should read the .names file to better understand the problem you are trying to solve.

    The two executables needed to run c4.5 are in the/u2/admin/cs572ta/bin directory. They are 'c4.5' (surprise!) and 'xlate'. 'xlate' is a program that puts the data in a form that c4.5 can handle. If you run xlate without any argument (like this: xlate |more) you get 'manpage' type information on what xlate can do for you.
    To xlate a file you type: xlate -a c45 -f file_name
    This will produce two files, file_name.data and file_name.test
    The first is your training set and the second is a test set.

    Now to run c4.5 you type: c4.5 -f file_name -u
    where the file_name has no extension. The -u option tell it to run on the 'unseen data' (test set) as well as the training set. I suggest you pipe the output to an output_file so you can analyze it at leisure.

    To get the manpage for c4.5 you need the c4.5.1 file from the /man1 directory. To see this information properly you need to type:
    nroff -man c4.5.1 |more
    Soon I will make this information accessible from this page.

    The following links may help to understand the output that c4.5 throws at you.
    commented output file
    example output file