Hints

Hints for Delta Rule Simulation

It's possible to code the basic Delta Rule algorithm (not including reading in the data file or outputing information for analysis) in less than 50 lines. If you're design is really involved, you may be missing some simplifying assumptions you can make.

For instance, we know that there is only one layer of weights and that the network is fully connected, so we can represent the network as a matrix of weights. Here's an example of doing this for a 3 x 2 network.



You could also make a same sized matrix for the change in weights. Now the algorithm is fairly simple, multiply a pattern times the weights in one column, sum the values up to get the net value for that output node, put the net value through your activation function (the threshold function) to get the output, and then calculate the change in the weights for that column. Repeat this for each column and then add the change in weights matrix to the weight matrix. Vola! It's just like your homework. And if it's not obvious, your 2D matrices could be implemented as (number of inputs + 1) x (number of outputs) arrays of floats.

Creating Test Sets

Also, you're probably going to want to divide your data sets into a training and test set. Here's a window's .exe that can do this for you for .pat files:
Download it here.

Call it with no arguments to get a print out of the usage, but here's an example:

split lungcancer.pat -s 0.8 lungcancer_train.pat 0.2 lungcancer_test.pat

This command creates 2 files: 80% of lungcancer.pat's patterns in lungcancer_train.pat and the other 20% in lungcancer_test.pat. The -s option means that the patterns were shuffled before they were split.

It's important to use the same training and test set in experiments that you are comparing. The way the data is split can effect your results.


Dealing with Multiple Outputs

Many of the training files have 1 output for multiple classes. For example, iris.pat has 1 output which outputs 1, 2, or 3. Since Delta Rule can only output 0 and 1, in order to deal with this, the network should have a node for each possible output. For iris, there would be 3 output nodes and the possible correct outputs would be: 001, 010, and 100 (corresponding to class 1, class 2, and class 3 respectively).

What if more than one node outputs 1? This means that two or more classes were close and the network could not decide which to output. If we were expecting 001, and we got 011 the PSSE would be 1 even though 1 was output from the correct node. In this case, the network couldn’t decide whether to output class 2 or class 3.

Pattern Sum Squared Error:

where n is the number of outputs, Ti is the Target value for the ith output node, and Oi is the actual output from the ith output node.


Total Sum Squared Error:

where n is the number of patterns in the training set, and PSSEi is the PSSE for the ith pattern in the training set.