How to plot your data with minimal pain

Let's say, for example, that you have written a Perceptron or Neural Network learning algorithm, and you have been asked to plot the accuracy versus epochs. This tutorial will show you how to do that.

  1. First, put your data into a text file of comma-separated values (also known as a CSV file).
    • The easiest way to put your data into a CSV file is to simply print it to stdout, and when you run your program, pipe its output to a file.
    • For example, let's say your main training loop looks something like this:
int epochs = 0; 
while(true) {
  do_one_training_epoch(trainingSet); 
  epochs++; 
  double mseTraining = evaluate_accuracy(trainingSet); 
  double mseValidation = evaluate_accuracy(validationSet); 
  if(check_stopping_criteria(mseValidation))
  break; 
} 
 

Since we want to plot accuracy versus epochs, we will print those values:

 
int epochs = 0; 
while(true) { 
  do_one_training_epoch(trainingSet); 
  epochs++; 
  double mseTraining = evaluate_accuracy(trainingSet); 
  double mseValidation = evaluate_accuracy(validationSet); 
  System.out.write(epochs); 
  System.out.write(", "); 
  System.out.write(mseTraining); 
  System.out.write(", "); 
  System.out.write(mseValidation); 
  System.out.write("\n"); 
  if(check_stopping_criteria(sse)) 
  break; 
} 
 
    • Now, just run your program and pipe the output to a file. Example:
./MLSystemManagerDbg -L neuralnet -A iris.arff -E training > accuracy_vs_epochs.csv 

2.      

3.     Next, open your favorite spreadsheet program. Spreadsheet programs like MS-Excel are commonly used tools to plot data. I also recommend the spreadsheet program in LibreOffice because it is free and open, and it is good enough. Import your CSV file. Make sure that each value is in its own cell.

4.     Use your mouse to draw a box around your data in the spreadsheet.

5.     Click on "insert"->"chart". You will see several chart types. The one you want is called something like "XY (Scatter)". This type of chart will use the first column for the horizontal axis, and all the other columns will be plotted on the vertical axis.

6.     If the chart is too small, grab one of the corners and stretch it bigger.

7.     At this point, I like to take a screen shot of the chart, and paste it into a simple painting program. (I like to use Kolourpaint because it is simple and fast.) I use the painting program to label the axes, to scale it to the right size for my report, to label my lines, etc. Yes, I know that the spreadsheet provides a mechanism to label the axes, but it isn't very flexible. The painting program will let me do whatever I want. I can circle things that I wish to highlight for the reader, I can label my lines right on top of the chart, I can draw arrows to direct the reader's attention to interesting trends in the graph, etc. Don't be constrained by the capabilities of your plotting tool--in fact, I recommend not even wasting your time learning them, since painting programs will always be more flexible.

Now, where should you go when you're ready to move beyond the basic capabilities of your spreadsheet program? My advice is to choose a plotting program that will enable you to automate your processes. If you discover a bug in your algorithm, or if you think of a cool tweak that you would like to try, you may not like the idea of having to regenerate your plots by hand. On the other hand, it's not such a big deal if all you have to do is run a script and it regenerates all of your results, and even the charts come out ready for you to add your final touches.

Here are some plotting tools that can be (at least partially) automated. (These have a much steeper learning curve than the spreadsheets, but they will pay you back in the long run if you make a lot of charts.)

  • GnuPlot (Warning: Not really open source, despite the deceptive prefix.)
  • Waffles
  • Know of any others?