You will implement the following four approaches to
predict future accuracy (details below):
Training
Set Method
·
Use full data set to
train a model
·
Compute accuracy on same
dataset
Static
Split Test Set Method
Two
distinct datasets (ARFF files) are made available to the machine learner: a
training set and a test set.
·
The training set is used
for learning/training (i.e., inducing a model), and
·
The test set is used
exclusively for testing
Random
Split Test Set Method
·
A single data set is
made available to the machine learner
·
The data is split (by
the learner) into a training and a test set, such that:
o
Instances are randomly
assigned to either set – Do this by randomizing the data set before the
split. Stratification (where the
distribution of instances with respect to the target class is the same in both
sets) is optional
o
x% of instances are used for training and the remainder
for testing (x is input by the
user)
N-fold Cross-validation Method
·
Partition dataset (call
it D) into N (input by user) equally-sized subsets S1, ..., SN
·
For k = 1 to N
o
Let Mk be the model induced from D - Sk
o
Let nk be the number of instances of Sk correctly classified by Mk
·
Return (n1+n2+...+nN)/|D|