Skip to content
ldmbouge edited this page Sep 14, 2010 · 6 revisions

Using/Testing VSTLF

Here is a complete rundown of what to do to run the system.

Where we work

First, let’s assume that your current working directory for everything you will do below is

~/work

Setup the input data.

You need to import a four second stream into the system. The input should be in XML files with a very simple structure.
Create a folder (say xml, so that you have ~/work/xml) and store in that xml folder all the XML files with daily data streams. Feel free to create a sub-directory for each year. You will end up with a structure like:

~/work/xml/2007
~/work/xml/2008
~/work/xml/2009
~/work/xml/2010

Get the source

Update from git and compile the system.

Assuming that you already cloned and pulled in a directory (say vstlf) under ~/work, you should have the following directory structure after a pull ~/work/vstlf which contains sub-folders (doc, lib , src) and a build.xml (an ant file). Compile by simply typing at the command line: ant You should now have a good looking uconn-vstlf.jar

Test your jar!

If you wish to have the “root menu”, simply do: java -jar uconn-vstlf.jar And you should see:

asterix:vstlf ldm$ java -jar uconn-vstlf.jar 
USAGE:
	java -jar uconn-vstlf.jar [command [cmd ... spcfc ... args ...]]
	Issue a specific command with no arguments for detailed usage information
	Valid commands are:
	train		Use historical load data to generate neural nets that can be used in real-time VSTLF
	validate	Use historical load data to test the quality of networks trained using the 'train' command
	run		Run the real-time VSTLF system (headless.  output through xml stream)
	run-gui		Run the real-time VSTLF system (requires this machine to support graphics via Java Swing)
	audit		Get a report on the error results of a previously run (or currently running) real-time system
	reset		Erase the current directory's VSTLF history
	build-perst	Build a perst database from xml files
	gen5m		Generate 5 minute loads from a perst database containing 4 second loads

For all the subsequent operations, I run java with an extra argument to increase the heap size. So the command I normally use is:

java -Xmx1024M -jar uconn-vstlf.jar

(it is case sensitive)

Import the 4s stream (Training data only)

Now, time to import the four second stream for training into a perst database. For that, you will use the build-perst command as follows:

java -Xmx1024M -jar uconn-vstlf.jar build-perst

Without any arguments you get the following response:

USAGE: 
	java -jar uconn-vstlf.jar build-perst <xmlFileName> <perstDBName> <incremental interval(4 or 300)>

Which shows that the “correct” command should be:

java -Xmx1024M -jar uconn-vstlf.jar build-perst ../xml/2008 4sec.pod 4

To import all the 2008 XML files into a PERST database named 4sec.pod
The last argument is the “tick” resolution (4 seconds here)

This will produce some outputs (dots, one per day imported) plus messages when it “fixes” the input stream via linear interpolation. It treats the files in lexicographic order of file name, so you won’t see the days of january imported “in order”. It will do Jan 1, then Jan 11, Jan 12, ….. Jan 19, Jan 2, Jan 21,…. That’s all fine. The program should end without any errors given that I fixed the few bad days.

Once 2008 is done, repeat the command for 2009:

java -Xmx1024M -jar uconn-vstlf.jar build-perst ../xml/2009 4sec.pod 4

Now it will notice that 4sec.pod already exist and will ask for confirmation to append to it (it is case sensitive, type an uppercase ‘Y’).

Again, this should be a few minutes with a “normal” completion.

Prepare the testing data

Now prepare another PERST database for the testing four second stream by importing the 2010 data.

java -Xmx1024M -jar uconn-vstlf.jar build-perst ../xml/2010 4sec-2010.pod 4

Again, this should complete with no errors and you should now be the proud owner of two .pod file with 4 seconds data streams for training and testing.

Synthesize the 5m stream.

The training works with a five minutes stream. Thankfully, you can create it easily in a few minutes by converting the 4 seconds stream (and applying the suitable micro and macro filters in the process). For this, do:

java -Xmx1024M -jar uconn-vstlf.jar gen5m

The output will be:

asterix:vstlf ldm$ java -Xmx1024M -jar uconn-vstlf.jar  gen5m
USAGE: 
	java -jar uconn-vstlf.jar gen5m <indbName> <outdbName>   or
	java -jar uconn-vstlf.jar gen5m <indbName> <outdbName> "<startDate yyyy/MM/dd>" "<endDate yyyy/MM/dd>"

Showing you the usage scenario.

The “real” command to do the work is simply:

java -Xmx1024M -jar uconn-vstlf.jar  gen5m  4sec.pod 5min.pod 

And you should see the dates flying by. This took easily 10 minutes (lots of printing IO that we can “streamline”).

Sanity check

Check that you have a good looking 5 minute PERST database:

In my case, I see:

-rw-r--r--  1 ldm  staff  767492096 May 18 19:17 4sec.pod
-rw-r--r--  1 ldm  staff  139255808 May 19 07:41 4sec_2010.pod
-rw-r--r--  1 ldm  staff   10412032 May 18 19:51 5min.pod

Training

Ok, ready to train ?
This is where you should round up four CPUs if you wish to see this completed overnight.

The magical command is:

java -Xmx1024M -jar uconn-vstlf.jar  train 

whose default ouput is:

asterix:vstlf ldm$ java -Xmx1024M -jar uconn-vstlf.jar  train
USAGE:
	java -jar uconn-vstlf.jar train <lowBank> <highBank> <xmlFile>   or
	java -jar uconn-vstlf.jar train <lowBank> <highBank> <xmlFile> "<startDate yyyy/MM/dd>" "<endDate yyyy/MM/dd>"
		 lowBank, highBank in [0,11] : the program will train ANN banks for the offsets in the specified interval
		 xmlFile : 5minute load file.  XML.  (see manual for XML format)
		 startDate, endDate : the training period

The specified set of neural network banks will be trained over the time period contained in ‘xmlFile’.
It is assumed that the current directory contains a folder called ‘anns/’. If the contents

(some subset of {bank0.ann, bank1.ann, bank2.ann, … , bank11.ann})

include the ‘.ann’ files corresponding to the set of banks to be trained, then the existing networks will be used as a
starting point for further training.

And the message is quite misleading as it works off of POD files, not XML files. (must fix)
So do:

  1. on processor 1:
    java -Xmx1024M -jar uconn-vstlf.jar train 0 2 5min.pod
  2. on processor 2:
    java -Xmx1024M -jar uconn-vstlf.jar train 3 5 5min.pod
  3. on processor 3:
    java -Xmx1024M -jar uconn-vstlf.jar train 6 8 5min.pod
  4. on processor 4:
    java -Xmx1024M -jar uconn-vstlf.jar train 9 11 5min.pod

If the processors are on physically distinct machines, you need to copy the .jar file and the 5min.pod file on the machine where you wish to run this (naturally, you’ll need to copy back the trained networks). If all goes well, 12 hours later, you should have a bunch of trained networks, aptly named:

asterix:vstlf ldm$ ls -l *.ann
-rw-r--r--  1 ldm  staff  577536 May 18 22:09 bank0.ann
-rw-r--r--  1 ldm  staff  577536 May 19 00:25 bank1.ann
-rw-r--r--  1 ldm  staff  577536 May 19 01:11 bank10.ann
-rw-r--r--  1 ldm  staff  577536 May 19 03:26 bank11.ann
-rw-r--r--  1 ldm  staff  577536 May 19 02:40 bank2.ann
-rw-r--r--  1 ldm  staff  577536 May 19 04:56 bank3.ann
-rw-r--r--  1 ldm  staff  765952 May 19 07:12 bank4.ann
-rw-r--r--  1 ldm  staff  577536 May 19 01:16 bank5.ann
-rw-r--r--  1 ldm  staff  577536 May 18 22:59 bank6.ann
-rw-r--r--  1 ldm  staff  577536 May 19 01:15 bank7.ann
-rw-r--r--  1 ldm  staff  577536 May 19 03:30 bank8.ann
-rw-r--r--  1 ldm  staff  577536 May 18 22:55 bank9.ann

Copy/move them into a new folder named anns underneath your vstlf directory, i.e., ~/work/vstlf/anns

You are now ready to run a test!

You can run the UI with

java -Xmx1024M  -jar uconn-vstlf.jar run-gui 4sec_2010.pod 5min.pod "2010/01/01 - 00:00:00" 2

The command is “run-gui”
The arguments are:

4sec_2010.pod The perst database with the 4 second test stream (running Jan 1, 2010 to May 13, 2010).
5min.pod The perst database with the 5 minutes stream containing the day prior to the start of the test (i.e., Dec 31, 2009). as the system needs 24 hours of 5m data to bootstrap the filters.
“2010/01/01 – 00:00:00” The start date for the test. Naturally, if you push the date forward, you need to build another PERST file with the 5 minute load for the day before.
2 The “acceleration” factor. Ok, the GUI normally runs “real time”. That can be very slow for testing. So you can “push it” to go faster. Rather than picking up a point every four seconds (constant would be 4000) it picks up a 4s point every 2ms. Naturally, pick a value that is reasonable for your CPU. My iMac can take a value as low as 1ms. Older/slower machine might fare better with a 5 or even a 10. If the UI is sluggish and the machine on its knee, considering “slowing down” (higher constant).

Simulating the 4 month at this speed took (subjectively) about 1h.

That’s it!

The statistics (MAE/MAPE/MIN/MAX) reported are cumulative for the entire test period. So they tend to fluctuate at the beginning and “stabilize” over time. On my test, I started off with a 60 minute mape around 2 and it dropped to ~0.6 after 10 days worth of simulation or so (You have the screenshot, that’s the first I sent).