Date: Tue, 16 Mar 93 11:38:43 -0700 From: kevin@axon.cs.byu.edu (Kevin Vanhorn) To: rick@axon, george@axon, cory@axon, martinez@bunsen.cs.byu.edu, vanhorn@bert.cs.byu.edu, cgc@axon, tim@axon, randy@axon, dan@axon Subject: Random ML problem generator In the directory .../mldb/rand I have installed the following programs for generating random machine learning problems: genex1_d, genex1_h, genex2_d, genex2_h. The "_d" programs run on the DECstations, and the "_h" programs run on the HP's. The argument lists are as follows: genex1 WHERE is the number of rules in the randomly-generated ordered set of production rules which is the target hypothesis is the number of possible different classes is the number of possible binary inputs is the fraction of noisy examples is the number of examples to generate for the training set is the number of examples to generate for the testing set is the name of the file to which the training examples are written is the name of the file to whcih the test examples are written. The files and are written in the "num" HUT format. Genex1 uses a uniform distribution over the input space, and chooses the number of literals in each rule's test so as to ensure that each rule (including the default) covers an approximately equal fraction of the input space. genex2 WHERE is the total number of literals used in the target hypothesis the other arguments are the same as for genex1. Genex2 does *not* use a uniform distribution over the input space. Instead, there is an equal probability that an input will be covered by any of the +1 rules plus default, and a uniform distribution is used within the set of examples covered by any rule (including default). (Note: When I say that a rule "covers" an input vector I mean that it is the first rule that matches the input vector.) Please report to me any problems in using these programs. ----------------------------------------------------------------------------- Kevin S. Van Horn | It is the means that determine the ends. vanhorn@bert.cs.byu.edu |