Testing
Your Decision Tree
Here's a fairly simple dataset that you can use to test your decision
tree.
%
% This is a simple dataset for testing your decision tree
% (The opinions expressed in this file are not necessarily
% endorsed by anyone with good taste in pizza.)
%
@RELATION pizza
@ATTRIBUTE cheese { none, some, lots }
@ATTRIBUTE meat { vegetarian, meaty }
@ATTRIBUTE sauce { sweet, salty }
@ATTRIBUTE flour { low_gluten, high_gluten }
@ATTRIBUTE class { awful, edible, stellar }
@DATA
%none, vegetarian, sweet, low_gluten, awful
%some, vegetarian, sweet, low_gluten, awful
lots, vegetarian, sweet, low_gluten, awful
lots, vegetarian, sweet, low_gluten, edible
none, meaty, sweet, low_gluten, awful
some, meaty, sweet, low_gluten, edible
lots, meaty, sweet, low_gluten, awful
none, vegetarian, salty, low_gluten, awful
some, vegetarian, salty, low_gluten, awful
lots, vegetarian, salty, low_gluten, edible
none, meaty, salty, low_gluten, awful
some, meaty, salty, low_gluten, awful
lots, meaty, salty, low_gluten, awful
none, vegetarian, sweet, high_gluten, awful
some, vegetarian, sweet, high_gluten, awful
lots, vegetarian, sweet, high_gluten, edible
none, meaty, sweet, high_gluten, awful
some, meaty, sweet, high_gluten, edible
lots, meaty, sweet, high_gluten, stellar
none, vegetarian, salty, high_gluten, awful
some, vegetarian, salty, high_gluten, edible
lots, vegetarian, salty, high_gluten, stellar
none, meaty, salty, high_gluten, edible
some, meaty, salty, high_gluten, stellar
lots, meaty, salty, high_gluten, stellar
Perhaps the easiest way to debug a decision tree is to spew lots of
info to a log file and then compare results. It helps with readability
if you tabulate your log
spew according to the depth of the tree. Here's the log spew that I get
from this data. If you make your log spew follow the same format, then
you can simply diff your results with these to find errors. (To format
doubles, I used printf("%.14lg", value);).
Applicable
rules:
Applicable patterns: 23
{lots, vegetarian, sweet, low_gluten, awful}
{lots, vegetarian, sweet, low_gluten, edible}
{none, meaty, sweet, low_gluten, awful}
{some, meaty, sweet, low_gluten, edible}
{lots, meaty, sweet, low_gluten, awful}
{none, vegetarian, salty, low_gluten, awful}
{some, vegetarian, salty, low_gluten, awful}
{lots, vegetarian, salty, low_gluten, edible}
{none, meaty, salty, low_gluten, awful}
{some, meaty, salty, low_gluten, awful}
{lots, meaty, salty, low_gluten, awful}
{none, vegetarian, sweet, high_gluten, awful}
{some, vegetarian, sweet, high_gluten, awful}
{lots, vegetarian, sweet, high_gluten, edible}
{none, meaty, sweet, high_gluten, awful}
{some, meaty, sweet, high_gluten, edible}
{lots, meaty, sweet, high_gluten, stellar}
{none, vegetarian, salty, high_gluten, awful}
{some, vegetarian, salty, high_gluten, edible}
{lots, vegetarian, salty, high_gluten, stellar}
{none, meaty, salty, high_gluten, edible}
{some, meaty, salty, high_gluten, stellar}
{lots, meaty, salty, high_gluten, stellar}
Are there still attributes on which we can split? Yes.
Is the output homogenous? No.
If we were to split on cheese, the info-gain would be 1.4509082837502 -
7 / 23 * 0.59167277858233 - 7 / 23 * 1.4488156357252 - 9 / 23 *
1.5849625007212 = 0.20968735302657
If we were to split on meat, the info-gain would be 1.4509082837502 -
11 / 23 * 1.3221793455167 - 12 / 23 * 1.5 = 0.035952944590037
If we were to split on sauce, the info-gain would be 1.4509082837502 -
11 / 23 * 1.3221793455167 - 12 / 23 * 1.5 = 0.035952944590037
If we were to split on flour, the info-gain would be 1.4509082837502 -
11 / 23 * 0.84535093662244 - 12 / 23 * 1.5849625007212 =
0.21967305281537
So we will split on flour
Applicable rules: flour=low_gluten,
Applicable patterns: 11
{some, vegetarian, salty, low_gluten, awful}
{none, vegetarian, salty, low_gluten, awful}
{some, meaty, salty, low_gluten, awful}
{lots, meaty, salty, low_gluten, awful}
{some, meaty, sweet, low_gluten, edible}
{lots, vegetarian, sweet, low_gluten, edible}
{lots, vegetarian, sweet, low_gluten, awful}
{lots, meaty, sweet, low_gluten, awful}
{none, meaty, sweet, low_gluten, awful}
{lots, vegetarian, salty, low_gluten, edible}
{none, meaty, salty, low_gluten, awful}
Are there still attributes on which we can split?
Yes.
Is the output homogenous? No.
If we were to split on cheese, the info-gain would
be 0.84535093662244 - 3 / 11 * 0 - 3 / 11 * 0.91829583405449 - 5 / 11 *
0.97095059445467 = 0.15356543894636
If we were to split on meat, the info-gain would be
0.84535093662244 - 5 / 11 * 0.97095059445467 - 6 / 11 *
0.65002242164835 = 0.049452072789394
If we were to split on sauce, the info-gain would be
0.84535093662244 - 5 / 11 * 0.97095059445467 - 6 / 11 *
0.65002242164835 = 0.049452072789394
So we will split on cheese
Applicable rules: cheese=none,
flour=low_gluten,
Applicable patterns: 3
{none, vegetarian, salty,
low_gluten, awful}
{none, meaty, salty, low_gluten,
awful}
{none, meaty, sweet, low_gluten,
awful}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=some,
flour=low_gluten,
Applicable patterns: 3
{some, vegetarian, salty,
low_gluten, awful}
{some, meaty, salty, low_gluten,
awful}
{some, meaty, sweet, low_gluten,
edible}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on meat, the
info-gain would be 0.91829583405449 - 1 / 3 * 0 - 2 / 3 * 1 =
0.25162916738782
If we were to split on sauce, the
info-gain would be 0.91829583405449 - 1 / 3 * 0 - 2 / 3 * 0 =
0.91829583405449
So we will split on sauce
Applicable
rules: cheese=some, sauce=sweet, flour=low_gluten,
Applicable
patterns: 1
{some, meaty,
sweet, low_gluten, edible}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable
rules: cheese=some, sauce=salty, flour=low_gluten,
Applicable
patterns: 2
{some, meaty,
salty, low_gluten, awful}
{some,
vegetarian, salty, low_gluten, awful}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable rules: cheese=lots,
flour=low_gluten,
Applicable patterns: 5
{lots, meaty, sweet, low_gluten,
awful}
{lots, vegetarian, sweet,
low_gluten, edible}
{lots, vegetarian, salty,
low_gluten, edible}
{lots, meaty, salty, low_gluten,
awful}
{lots, vegetarian, sweet,
low_gluten, awful}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on meat, the
info-gain would be 0.97095059445467 - 3 / 5 * 0.91829583405449 - 2 / 5
* 0 = 0.41997309402197
If we were to split on sauce, the
info-gain would be 0.97095059445467 - 3 / 5 * 0.91829583405449 - 2 / 5
* 1 = 0.019973094021975
So we will split on meat
Applicable
rules: cheese=lots, meat=vegetarian, flour=low_gluten,
Applicable
patterns: 3
{lots,
vegetarian, salty, low_gluten, edible}
{lots,
vegetarian, sweet, low_gluten, awful}
{lots,
vegetarian, sweet, low_gluten, edible}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on sauce, the info-gain would be 0.91829583405449 - 2 / 3 * 1 - 1
/ 3 * 0 = 0.25162916738782
So we will
split on sauce
Applicable rules: cheese=lots, meat=vegetarian,
sauce=sweet, flour=low_gluten,
Applicable patterns: 2
{lots, vegetarian, sweet, low_gluten, edible}
{lots, vegetarian, sweet, low_gluten, awful}
Are there still attributes on which we can split? No.
Is the output homogenous? No.
So we make a leaf node
Applicable rules: cheese=lots, meat=vegetarian,
sauce=salty, flour=low_gluten,
Applicable patterns: 1
{lots, vegetarian, salty, low_gluten, edible}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable
rules: cheese=lots, meat=meaty, flour=low_gluten,
Applicable
patterns: 2
{lots, meaty,
sweet, low_gluten, awful}
{lots, meaty,
salty, low_gluten, awful}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable rules: flour=high_gluten,
Applicable patterns: 12
{some, meaty, salty, high_gluten, stellar}
{some, vegetarian, salty, high_gluten, edible}
{some, meaty, sweet, high_gluten, edible}
{none, meaty, sweet, high_gluten, awful}
{lots, meaty, sweet, high_gluten, stellar}
{none, vegetarian, salty, high_gluten, awful}
{none, meaty, salty, high_gluten, edible}
{none, vegetarian, sweet, high_gluten, awful}
{some, vegetarian, sweet, high_gluten, awful}
{lots, vegetarian, sweet, high_gluten, edible}
{lots, meaty, salty, high_gluten, stellar}
{lots, vegetarian, salty, high_gluten, stellar}
Are there still attributes on which we can split?
Yes.
Is the output homogenous? No.
If we were to split on cheese, the info-gain would
be 1.5849625007212 - 4 / 12 * 0.81127812445913 - 4 / 12 * 1.5 - 4 / 12
* 0.81127812445913 = 0.5441104177484
If we were to split on meat, the info-gain would be
1.5849625007212 - 6 / 12 * 1.4591479170272 - 6 / 12 * 1.4591479170272 =
0.12581458369391
If we were to split on sauce, the info-gain would be
1.5849625007212 - 6 / 12 * 1.4591479170272 - 6 / 12 * 1.4591479170272 =
0.12581458369391
So we will split on cheese
Applicable rules: cheese=none,
flour=high_gluten,
Applicable patterns: 4
{none, meaty, salty, high_gluten,
edible}
{none, vegetarian, salty,
high_gluten, awful}
{none, meaty, sweet, high_gluten,
awful}
{none, vegetarian, sweet,
high_gluten, awful}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on meat, the
info-gain would be 0.81127812445913 - 2 / 4 * 0 - 2 / 4 * 1 =
0.31127812445913
If we were to split on sauce, the
info-gain would be 0.81127812445913 - 2 / 4 * 0 - 2 / 4 * 1 =
0.31127812445913
So we will split on meat
Applicable
rules: cheese=none, meat=vegetarian, flour=high_gluten,
Applicable
patterns: 2
{none,
vegetarian, salty, high_gluten, awful}
{none,
vegetarian, sweet, high_gluten, awful}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable
rules: cheese=none, meat=meaty, flour=high_gluten,
Applicable
patterns: 2
{none, meaty,
salty, high_gluten, edible}
{none, meaty,
sweet, high_gluten, awful}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on sauce, the info-gain would be 1 - 1 / 2 * 0 - 1 / 2 * 0 = 1
So we will
split on sauce
Applicable rules: cheese=none, meat=meaty,
sauce=sweet, flour=high_gluten,
Applicable patterns: 1
{none, meaty, sweet, high_gluten, awful}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=none, meat=meaty,
sauce=salty, flour=high_gluten,
Applicable patterns: 1
{none, meaty, salty, high_gluten, edible}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=some,
flour=high_gluten,
Applicable patterns: 4
{some, vegetarian, salty,
high_gluten, edible}
{some, meaty, salty, high_gluten,
stellar}
{some, vegetarian, sweet,
high_gluten, awful}
{some, meaty, sweet, high_gluten,
edible}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on meat, the
info-gain would be 1.5 - 2 / 4 * 1 - 2 / 4 * 1 = 0.5
If we were to split on sauce, the
info-gain would be 1.5 - 2 / 4 * 1 - 2 / 4 * 1 = 0.5
So we will split on meat
Applicable
rules: cheese=some, meat=vegetarian, flour=high_gluten,
Applicable
patterns: 2
{some,
vegetarian, salty, high_gluten, edible}
{some,
vegetarian, sweet, high_gluten, awful}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on sauce, the info-gain would be 1 - 1 / 2 * 0 - 1 / 2 * 0 = 1
So we will
split on sauce
Applicable rules: cheese=some, meat=vegetarian,
sauce=sweet, flour=high_gluten,
Applicable patterns: 1
{some, vegetarian, sweet, high_gluten, awful}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=some, meat=vegetarian,
sauce=salty, flour=high_gluten,
Applicable patterns: 1
{some, vegetarian, salty, high_gluten, edible}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable
rules: cheese=some, meat=meaty, flour=high_gluten,
Applicable
patterns: 2
{some, meaty,
salty, high_gluten, stellar}
{some, meaty,
sweet, high_gluten, edible}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on sauce, the info-gain would be 1 - 1 / 2 * 0 - 1 / 2 * 0 = 1
So we will
split on sauce
Applicable rules: cheese=some, meat=meaty,
sauce=sweet, flour=high_gluten,
Applicable patterns: 1
{some, meaty, sweet, high_gluten, edible}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=some, meat=meaty,
sauce=salty, flour=high_gluten,
Applicable patterns: 1
{some, meaty, salty, high_gluten, stellar}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=lots,
flour=high_gluten,
Applicable patterns: 4
{lots, meaty, salty, high_gluten,
stellar}
{lots, vegetarian, salty,
high_gluten, stellar}
{lots, meaty, sweet, high_gluten,
stellar}
{lots, vegetarian, sweet,
high_gluten, edible}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on meat, the
info-gain would be 0.81127812445913 - 2 / 4 * 1 - 2 / 4 * 0 =
0.31127812445913
If we were to split on sauce, the
info-gain would be 0.81127812445913 - 2 / 4 * 1 - 2 / 4 * 0 =
0.31127812445913
So we will split on meat
Applicable
rules: cheese=lots, meat=vegetarian, flour=high_gluten,
Applicable
patterns: 2
{lots,
vegetarian, salty, high_gluten, stellar}
{lots,
vegetarian, sweet, high_gluten, edible}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on sauce, the info-gain would be 1 - 1 / 2 * 0 - 1 / 2 * 0 = 1
So we will
split on sauce
Applicable rules: cheese=lots, meat=vegetarian,
sauce=sweet, flour=high_gluten,
Applicable patterns: 1
{lots, vegetarian, sweet, high_gluten, edible}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: cheese=lots, meat=vegetarian,
sauce=salty, flour=high_gluten,
Applicable patterns: 1
{lots, vegetarian, salty, high_gluten, stellar}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable
rules: cheese=lots, meat=meaty, flour=high_gluten,
Applicable
patterns: 2
{lots, meaty,
salty, high_gluten, stellar}
{lots, meaty,
sweet, high_gluten, stellar}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Here's another dataset to try:
% 1. Title:
Database for fitting contact lenses
%
% 2. Sources:
% (a) Cendrowska, J. "PRISM: An algorithm
for inducing modular rules",
% International
Journal of Man-Machine Studies, 1987, 27, 349-370
% (b) Donor: Benoit Julien
(Julien@ce.cmu.edu)
% (c) Date: 1 August 1990
%
% 3. Past Usage:
% 1. See above.
% 2. Witten, I. H. & MacDonald,
B. A. (1988). Using concept
% learning for
knowledge acquisition. International Journal of
% Man-Machine
Studies, 27, (pp. 349-370).
%
% Notes: This database is complete (all possible
combinations of
% attribute-value
pairs are represented).
%
% Each instance
is complete and correct.
%
% 9 rules cover
the training set.
%
% 4. Relevant Information Paragraph:
% The examples are complete and noise free.
% The examples highly simplified the problem.
The attributes do not
% fully describe all the factors affecting the
decision as to which type,
% if any, to fit.
%
% 5. Number of Instances: 24
%
% 6. Number of Attributes: 4 (all nominal)
%
% 7. Attribute Information:
% -- 3 Classes
% 1 : the patient should be fitted with
hard contact lenses,
% 2 : the patient should be fitted with
soft contact lenses,
% 1 : the patient should not be fitted
with contact lenses.
%
% 1. age of the patient: (1) young, (2)
pre-presbyopic, (3) presbyopic
% 2. spectacle prescription: (1) myope,
(2) hypermetrope
% 3. astigmatic: (1)
no, (2) yes
% 4. tear production rate: (1) reduced,
(2) normal
%
% 8. Number of Missing Attribute Values: 0
%
% 9. Class Distribution:
% 1. hard contact lenses: 4
% 2. soft contact lenses: 5
% 3. no contact lenses: 15
@relation contact-lenses
@attribute age
{young, pre-presbyopic, presbyopic}
@attribute spectacle-prescrip {myope, hypermetrope}
@attribute astigmatism {no, yes}
@attribute tear-prod-rate {reduced, normal}
@attribute contact-lenses {soft, hard, none}
@data
%
% 24 instances
%
young,myope,no,reduced,none
young,myope,no,normal,soft
young,myope,yes,reduced,none
young,myope,yes,normal,hard
young,hypermetrope,no,reduced,none
young,hypermetrope,no,normal,soft
young,hypermetrope,yes,reduced,none
young,hypermetrope,yes,normal,hard
pre-presbyopic,myope,no,reduced,none
pre-presbyopic,myope,no,normal,soft
pre-presbyopic,myope,yes,reduced,none
pre-presbyopic,myope,yes,normal,hard
pre-presbyopic,hypermetrope,no,reduced,none
pre-presbyopic,hypermetrope,no,normal,soft
pre-presbyopic,hypermetrope,yes,reduced,none
pre-presbyopic,hypermetrope,yes,normal,none
presbyopic,myope,no,reduced,none
presbyopic,myope,no,normal,none
presbyopic,myope,yes,reduced,none
presbyopic,myope,yes,normal,hard
presbyopic,hypermetrope,no,reduced,none
presbyopic,hypermetrope,no,normal,soft
presbyopic,hypermetrope,yes,reduced,none
presbyopic,hypermetrope,yes,normal,none
And here are the results:
Applicable
rules:
Applicable patterns: 24
{young, myope, no, reduced, none}
{young, myope, no, normal, soft}
{young, myope, yes, reduced, none}
{young, myope, yes, normal, hard}
{young, hypermetrope, no, reduced, none}
{young, hypermetrope, no, normal, soft}
{young, hypermetrope, yes, reduced, none}
{young, hypermetrope, yes, normal, hard}
{pre-presbyopic, myope, no, reduced, none}
{pre-presbyopic, myope, no, normal, soft}
{pre-presbyopic, myope, yes, reduced, none}
{pre-presbyopic, myope, yes, normal, hard}
{pre-presbyopic, hypermetrope, no, reduced, none}
{pre-presbyopic, hypermetrope, no, normal, soft}
{pre-presbyopic, hypermetrope, yes, reduced, none}
{pre-presbyopic, hypermetrope, yes, normal, none}
{presbyopic, myope, no, reduced, none}
{presbyopic, myope, no, normal, none}
{presbyopic, myope, yes, reduced, none}
{presbyopic, myope, yes, normal, hard}
{presbyopic, hypermetrope, no, reduced, none}
{presbyopic, hypermetrope, no, normal, soft}
{presbyopic, hypermetrope, yes, reduced, none}
{presbyopic, hypermetrope, yes, normal, none}
Are there still attributes on which we can split? Yes.
Is the output homogenous? No.
If we were to split on age, the info-gain would be 1.3260875253643 - 8
/ 24 * 1.5 - 8 / 24 * 1.2987949406954 - 8 / 24 * 1.0612781244591 =
0.039396503646121
If we were to split on spectacle-prescrip, the info-gain would be
1.3260875253643 - 12 / 24 * 1.3844315043406 - 12 / 24 * 1.1887218755409
= 0.039510835423566
If we were to split on astigmatism, the info-gain would be
1.3260875253643 - 12 / 24 * 0.97986875665115 - 12 / 24 *
0.91829583405449 = 0.37700523001148
If we were to split on tear-prod-rate, the info-gain would be
1.3260875253643 - 12 / 24 * 0 - 12 / 24 * 1.5545851693378 =
0.5487949406954
So we will split on tear-prod-rate
Applicable rules: tear-prod-rate=reduced,
Applicable patterns: 12
{young, myope, yes, reduced, none}
{pre-presbyopic, hypermetrope, yes, reduced, none}
{presbyopic, hypermetrope, yes, reduced, none}
{pre-presbyopic, myope, yes, reduced, none}
{presbyopic, myope, yes, reduced, none}
{pre-presbyopic, hypermetrope, no, reduced, none}
{young, hypermetrope, no, reduced, none}
{presbyopic, hypermetrope, no, reduced, none}
{presbyopic, myope, no, reduced, none}
{young, myope, no, reduced, none}
{pre-presbyopic, myope, no, reduced, none}
{young, hypermetrope, yes, reduced, none}
Are there still attributes on which we can split?
Yes.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: tear-prod-rate=normal,
Applicable patterns: 12
{young, hypermetrope, no, normal, soft}
{young, myope, yes, normal, hard}
{presbyopic, hypermetrope, no, normal, soft}
{pre-presbyopic, hypermetrope, yes, normal, none}
{presbyopic, myope, no, normal, none}
{presbyopic, hypermetrope, yes, normal, none}
{young, myope, no, normal, soft}
{presbyopic, myope, yes, normal, hard}
{pre-presbyopic, myope, no, normal, soft}
{young, hypermetrope, yes, normal, hard}
{pre-presbyopic, hypermetrope, no, normal, soft}
{pre-presbyopic, myope, yes, normal, hard}
Are there still attributes on which we can split?
Yes.
Is the output homogenous? No.
If we were to split on age, the info-gain would be
1.5545851693378 - 4 / 12 * 1 - 4 / 12 * 1.5 - 4 / 12 * 1.5 =
0.22125183600447
If we were to split on spectacle-prescrip, the
info-gain would be 1.5545851693378 - 6 / 12 * 1.4591479170272 - 6 / 12
* 1.4591479170272 = 0.095437252310555
If we were to split on astigmatism, the info-gain
would be 1.5545851693378 - 6 / 12 * 0.65002242164835 - 6 / 12 *
0.91829583405449 = 0.77042604148638
So we will split on astigmatism
Applicable rules: astigmatism=no,
tear-prod-rate=normal,
Applicable patterns: 6
{pre-presbyopic, hypermetrope,
no, normal, soft}
{young, hypermetrope, no, normal,
soft}
{young, myope, no, normal, soft}
{pre-presbyopic, myope, no,
normal, soft}
{presbyopic, myope, no, normal,
none}
{presbyopic, hypermetrope, no,
normal, soft}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on age, the
info-gain would be 0.65002242164835 - 2 / 6 * 0 - 2 / 6 * 0 - 2 / 6 * 1
= 0.31668908831502
If we were to split on
spectacle-prescrip, the info-gain would be 0.65002242164835 - 3 / 6 *
0.91829583405449 - 3 / 6 * 0 = 0.19087450462111
So we will split on age
Applicable
rules: age=young, astigmatism=no, tear-prod-rate=normal,
Applicable
patterns: 2
{young,
hypermetrope, no, normal, soft}
{young, myope,
no, normal, soft}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable
rules: age=pre-presbyopic, astigmatism=no, tear-prod-rate=normal,
Applicable
patterns: 2
{pre-presbyopic, hypermetrope, no, normal, soft}
{pre-presbyopic, myope, no, normal, soft}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable
rules: age=presbyopic, astigmatism=no, tear-prod-rate=normal,
Applicable
patterns: 2
{presbyopic,
hypermetrope, no, normal, soft}
{presbyopic,
myope, no, normal, none}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on spectacle-prescrip, the info-gain would be 1 - 1 / 2 * 0 - 1 /
2 * 0 = 1
So we will
split on spectacle-prescrip
Applicable rules: age=presbyopic,
spectacle-prescrip=myope, astigmatism=no, tear-prod-rate=normal,
Applicable patterns: 1
{presbyopic, myope, no, normal, none}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: age=presbyopic,
spectacle-prescrip=hypermetrope, astigmatism=no, tear-prod-rate=normal,
Applicable patterns: 1
{presbyopic, hypermetrope, no, normal, soft}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules:
astigmatism=yes, tear-prod-rate=normal,
Applicable patterns: 6
{pre-presbyopic, hypermetrope,
yes, normal, none}
{pre-presbyopic, myope, yes,
normal, hard}
{young, hypermetrope, yes,
normal, hard}
{young, myope, yes, normal, hard}
{presbyopic, hypermetrope, yes,
normal, none}
{presbyopic, myope, yes, normal,
hard}
Are there still attributes on
which we can split? Yes.
Is the output homogenous? No.
If we were to split on age, the
info-gain would be 0.91829583405449 - 2 / 6 * 0 - 2 / 6 * 1 - 2 / 6 * 1
= 0.25162916738782
If we were to split on
spectacle-prescrip, the info-gain would be 0.91829583405449 - 3 / 6 * 0
- 3 / 6 * 0.91829583405449 = 0.45914791702724
So we will split on
spectacle-prescrip
Applicable
rules: spectacle-prescrip=myope, astigmatism=yes,
tear-prod-rate=normal,
Applicable
patterns: 3
{pre-presbyopic, myope, yes, normal, hard}
{young, myope,
yes, normal, hard}
{presbyopic,
myope, yes, normal, hard}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? Yes.
So we make a
leaf node
Applicable
rules: spectacle-prescrip=hypermetrope, astigmatism=yes,
tear-prod-rate=normal,
Applicable
patterns: 3
{young,
hypermetrope, yes, normal, hard}
{pre-presbyopic, hypermetrope, yes, normal, none}
{presbyopic,
hypermetrope, yes, normal, none}
Are there
still attributes on which we can split? Yes.
Is the output
homogenous? No.
If we were to
split on age, the info-gain would be 0.91829583405449 - 1 / 3 * 0 - 1 /
3 * 0 - 1 / 3 * 0 = 0.91829583405449
So we will
split on age
Applicable rules: age=young,
spectacle-prescrip=hypermetrope, astigmatism=yes,
tear-prod-rate=normal,
Applicable patterns: 1
{young, hypermetrope, yes, normal, hard}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: age=pre-presbyopic,
spectacle-prescrip=hypermetrope, astigmatism=yes,
tear-prod-rate=normal,
Applicable patterns: 1
{pre-presbyopic, hypermetrope, yes, normal, none}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node
Applicable rules: age=presbyopic,
spectacle-prescrip=hypermetrope, astigmatism=yes,
tear-prod-rate=normal,
Applicable patterns: 1
{presbyopic, hypermetrope, yes, normal, none}
Are there still attributes on which we can split? No.
Is the output homogenous? Yes.
So we make a leaf node