CS478 ARFF Details

An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a machine learning dataset (or relation). It was developed at the University of Waikato (NZ) for use with the Weka machine learning software. We will use a simplifed version for cs478.

ARFF files have two distinct sections:
Metadata information (relation's schema)
Name of relation
List of attributes and domains
Data information
Actual instances or rows of the relation
Optional comments may also be included (lines prefixed with %)

Here is an example of a small ARFF file:
% 1. Title: Hypothetical Database
% 
% 2. Sources:
%      (a) Creator: C. Giraud-Carrier
%      (b) Institution: BYU
%      (c) Date: August, 2004

@RELATION hypo

@ATTRIBUTE length	CONTINUOUS
@ATTRIBUTE color	{Red, Green, Blue}
@ATTRIBUTE age		CONTINUOUS
@ATTRIBUTE neighbors	{1, 2-to-4, 5-to-9, more-than-10}
@ATTRIBUTE class     	{True, False}
  
@DATA
   5.1,Red,4,2-to-4,True
   4.9,Blue,4,1,True
   4.7,Red,3,more-than-10,False
   4.6,Green,5,2-to-4,True
   5.0,Blue,4,5-to-9,False
   5.4,Red,7,5-to-9,True

The relation's name is defined in the first non-comment line of the file:
@RELATION <relation-name>
Attribute declarations take the form of an ordered sequence of @ATTRIBUTE statements, one per line:
@ATTRIBUTE <attribute-name> <domain>
The data declaration is a single line denoting the start of the data segment in the file:
@DATA