-
Notifications
You must be signed in to change notification settings - Fork 6
Home
Here is a complete rundown of what to do to run the system.
First, let’s assume that your current working directory for everything you will do below is
~/work
You need to import a four second stream into the system. The input should be in XML files with a very simple structure.
Create a folder (say xml, so that you have ~/work/xml) and store in that xml folder all the XML files with daily data streams. Feel free to create a sub-directory for each year. You will end up with a structure like:
~/work/xml/2007
~/work/xml/2008
~/work/xml/2009
~/work/xml/2010Update from git and compile the system.
Assuming that you already cloned and pulled in a directory (say vstlf) under ~/work, you should have the following directory structure after a pull~/work/vstlf
which contains sub-folders (doc, lib , src) and a build.xml (an ant file).
Compile by simply typing at the command line:
ant
You should now have a good looking uconn-vstlf.jar
If you wish to have the “root menu”, simply do:
java -jar uconn-vstlf.jar
And you should see:
asterix:vstlf ldm$ java -jar uconn-vstlf.jar
USAGE:
java -jar uconn-vstlf.jar [command [cmd ... spcfc ... args ...]]
Issue a specific command with no arguments for detailed usage information
Valid commands are:
train Use historical load data to generate neural nets that can be used in real-time VSTLF
validate Use historical load data to test the quality of networks trained using the 'train' command
run Run the real-time VSTLF system (headless. output through xml stream)
run-gui Run the real-time VSTLF system (requires this machine to support graphics via Java Swing)
audit Get a report on the error results of a previously run (or currently running) real-time system
reset Erase the current directory's VSTLF history
build-perst Build a perst database from xml files
gen5m Generate 5 minute loads from a perst database containing 4 second loadsFor all the subsequent operations, I run java with an extra argument to increase the heap size. So the command I normally use is:
java -Xmx1024M -jar uconn-vstlf.jar
(it is case sensitive)
Now, time to import the four second stream for training into a perst database. For that, you will use the build-perst command as follows:
java -Xmx1024M -jar uconn-vstlf.jar build-perst
Without any arguments you get the following response:
USAGE:
java -jar uconn-vstlf.jar build-perst <xmlFileName> <perstDBName> <incremental interval(4 or 300)>Which shows that the “correct” command should be:
java -Xmx1024M -jar uconn-vstlf.jar build-perst ../xml/2008 4sec.pod 4To import all the 2008 XML files into a PERST database named 4sec.pod
The last argument is the “tick” resolution (4 seconds here)
This will produce some outputs (dots, one per day imported) plus messages when it “fixes” the input stream via linear interpolation. It treats the files in lexicographic order of file name, so you won’t see the days of january imported “in order”. It will do Jan 1, then Jan 11, Jan 12, ….. Jan 19, Jan 2, Jan 21,…. That’s all fine. The program should end without any errors given that I fixed the few bad days.
Once 2008 is done, repeat the command for 2009:
java -Xmx1024M -jar uconn-vstlf.jar build-perst ../xml/2009 4sec.pod 4Now it will notice that 4sec.pod already exist and will ask for confirmation to append to it (it is case sensitive, type an uppercase ‘Y’).
Again, this should be a few minutes with a “normal” completion.
Now prepare another PERST database for the testing four second stream by importing the 2010 data.
java -Xmx1024M -jar uconn-vstlf.jar build-perst ../xml/2010 4sec-2010.pod 4Again, this should complete with no errors and you should now be the proud owner of two .pod file with 4 seconds data streams for training and testing.
The training works with a five minutes stream. Thankfully, you can create it easily in a few minutes by converting the 4 seconds stream (and applying the suitable micro and macro filters in the process). For this, do:
java -Xmx1024M -jar uconn-vstlf.jar gen5mThe output will be:
asterix:vstlf ldm$ java -Xmx1024M -jar uconn-vstlf.jar gen5m
USAGE:
java -jar uconn-vstlf.jar gen5m <indbName> <outdbName> or
java -jar uconn-vstlf.jar gen5m <indbName> <outdbName> "<startDate yyyy/MM/dd>" "<endDate yyyy/MM/dd>"Showing you the usage scenario.
The “real” command to do the work is simply:
java -Xmx1024M -jar uconn-vstlf.jar gen5m 4sec.pod 5min.pod And you should see the dates flying by. This took easily 10 minutes (lots of printing IO that we can “streamline”).
Check that you have a good looking 5 minute PERST database:
In my case, I see:
-rw-r--r-- 1 ldm staff 767492096 May 18 19:17 4sec.pod
-rw-r--r-- 1 ldm staff 139255808 May 19 07:41 4sec_2010.pod
-rw-r--r-- 1 ldm staff 10412032 May 18 19:51 5min.podOk, ready to train ?
This is where you should round up four CPUs if you wish to see this completed overnight.
The magical command is:
java -Xmx1024M -jar uconn-vstlf.jar train whose default ouput is:
asterix:vstlf ldm$ java -Xmx1024M -jar uconn-vstlf.jar train
USAGE:
java -jar uconn-vstlf.jar train <lowBank> <highBank> <xmlFile> or
java -jar uconn-vstlf.jar train <lowBank> <highBank> <xmlFile> "<startDate yyyy/MM/dd>" "<endDate yyyy/MM/dd>"
lowBank, highBank in [0,11] : the program will train ANN banks for the offsets in the specified interval
xmlFile : 5minute load file. XML. (see manual for XML format)
startDate, endDate : the training periodThe specified set of neural network banks will be trained over the time period contained in ‘xmlFile’.
It is assumed that the current directory contains a folder called ‘anns/’. If the contents
(some subset of {bank0.ann, bank1.ann, bank2.ann, … , bank11.ann})
include the ‘.ann’ files corresponding to the set of banks to be trained, then the existing networks will be used as a
starting point for further training.
And the message is quite misleading as it works off of POD files, not XML files. (must fix)
So do:
- on processor 1:
java -Xmx1024M -jar uconn-vstlf.jar train 0 2 5min.pod - on processor 2:
java -Xmx1024M -jar uconn-vstlf.jar train 3 5 5min.pod - on processor 3:
java -Xmx1024M -jar uconn-vstlf.jar train 6 8 5min.pod - on processor 4:
java -Xmx1024M -jar uconn-vstlf.jar train 9 11 5min.pod
If the processors are on physically distinct machines, you need to copy the .jar file and the 5min.pod file on the machine where you wish to run this (naturally, you’ll need to copy back the trained networks). If all goes well, 12 hours later, you should have a bunch of trained networks, aptly named:
asterix:vstlf ldm$ ls -l *.ann
-rw-r--r-- 1 ldm staff 577536 May 18 22:09 bank0.ann
-rw-r--r-- 1 ldm staff 577536 May 19 00:25 bank1.ann
-rw-r--r-- 1 ldm staff 577536 May 19 01:11 bank10.ann
-rw-r--r-- 1 ldm staff 577536 May 19 03:26 bank11.ann
-rw-r--r-- 1 ldm staff 577536 May 19 02:40 bank2.ann
-rw-r--r-- 1 ldm staff 577536 May 19 04:56 bank3.ann
-rw-r--r-- 1 ldm staff 765952 May 19 07:12 bank4.ann
-rw-r--r-- 1 ldm staff 577536 May 19 01:16 bank5.ann
-rw-r--r-- 1 ldm staff 577536 May 18 22:59 bank6.ann
-rw-r--r-- 1 ldm staff 577536 May 19 01:15 bank7.ann
-rw-r--r-- 1 ldm staff 577536 May 19 03:30 bank8.ann
-rw-r--r-- 1 ldm staff 577536 May 18 22:55 bank9.annCopy/move them into a new folder named anns underneath your vstlf directory, i.e., ~/work/vstlf/anns
You can run the UI with
java -Xmx1024M -jar uconn-vstlf.jar run-gui 4sec_2010.pod 5min.pod "2010/01/01 - 00:00:00" 2The command is “run-gui”
The arguments are:
| 4sec_2010.pod | The perst database with the 4 second test stream (running Jan 1, 2010 to May 13, 2010). |
| 5min.pod | The perst database with the 5 minutes stream containing the day prior to the start of the test (i.e., Dec 31, 2009). as the system needs 24 hours of 5m data to bootstrap the filters. |
| “2010/01/01 – 00:00:00” | The start date for the test. Naturally, if you push the date forward, you need to build another PERST file with the 5 minute load for the day before. |
| 2 | The “acceleration” factor. Ok, the GUI normally runs “real time”. That can be very slow for testing. So you can “push it” to go faster. Rather than picking up a point every four seconds (constant would be 4000) it picks up a 4s point every 2ms. Naturally, pick a value that is reasonable for your CPU. My iMac can take a value as low as 1ms. Older/slower machine might fare better with a 5 or even a 10. If the UI is sluggish and the machine on its knee, considering “slowing down” (higher constant). |
Simulating the 4 month at this speed took (subjectively) about 1h.
The statistics (MAE/MAPE/MIN/MAX) reported are cumulative for the entire test period. So they tend to fluctuate at the beginning and “stabilize” over time. On my test, I started off with a 60 minute mape around 2 and it dropped to ~0.6 after 10 days worth of simulation or so (You have the screenshot, that’s the first I sent).