Skip to content
zjpetersen edited this page May 4, 2016 · 22 revisions

Introduction

This is an introductory tutorial to get you started with using Squall. It assumes you have already set up Squall correctly and are able to load the web application. See the [Usage Instructions](https://github.com/ikinsella/squall/wiki/Usage-Instructions) page if you are having trouble with this.

Make your user account

First login with your lab login information (should be provided to you by your PI). Once logged in, navigate to the **Users** page and click **Create New User** to add your user information. Make sure your launch directory points to where you will be submitting jobs from.

Logout of the lab account and into your user account you just created.

Fibonocci sequence tutorial

Demo Files

The files listed below will be used for this demo and can be found in your Squall installation at the paths listed below.
  • MatFib.tar.gz (squall/Examples/Binaries/MatFib.tar.gz)
  • libXmu_libXt.el6.x86_64.tgz
  • FullSweep.yaml (squall/Examples/Params/FibProd/FullSweep.yaml)

Algorithms Page

Navigate to the **Algorithms** tab. Click **Add New Algorithm**. Add a name, description, and tags (optional). Click submit.

Now add a new implementation. Add a name and description as above. Also add the above algorithm. For the URL field, you are actually going to need three of them. So hit the '+' button above the field twice and two additional URL fields should appear.

http://proxy.chtc.wisc.edu/SQUID/r2014b.tar.gz http://proxy.chtc.wisc.edu/SQUID/username/libXmu_libXt.el6.x86_64.tgz http://proxy.chtc.wisc.edu/SQUID/username/MatFib.tar.gz

In this example r2014b.tar.gz is the entire MATLAB runtime, which needs to be sent along with your HTCondor job because it is not guaranteed that the computer will have MATLAB so we need to include it in our package. libXmu_libXt.el6.x86_64.tgz is a series of libraries designed to make the jobs run more efficiently, this file is not technically necessary. Lastly MatFib.tar.gz is the code we would like to run. This tarball contains both the setup script for our MATLAB environment and our pre-compiled MATLAB binaries for our given implementation.

The setup scripts field is the location of the setup script after the tar file in untarred. For example, here it would be MatFib/setup.sh. Similarly, the executable would be MatFib/FibWrapper. To make sure the top level directory is correct you can type 'tar -tvf MatFib.tar.gz' which will list the tarball contents.

Data Page

Navigate to the **Data** page. Submit info for a data collection.

Next, submit info for a data collection. You will again have to submit the URL for the location of the data set. In this example it would be http://proxy.chtc.wisc.edu/SQUID/zjpetersen/dummy.json. This data set file isn't used in this case since the analysis for calculating the fibonacci sequence does not have a data set to analyze. (THIS ISN'T VERY HELPFUL FOR A TUTORIAL).

Experiments Page

Navigate to the **Experiments** page and add a new experiment. The collection and algorithm should be the one you added previously.

Next click Create New Batch. The batch name will be the the file name of the zip file that is generated. As a result, it is recommended to not include spaces in this name. Select the data set, implementation, and experiment that you created previously for this experiment. Also select the parameter file to be used. This file is located somewhere on your local machine. For this tutorial it is located in squall/Examples/Params/FibProd/FullSweep.yaml. Since there are 11 values for A and 11 values for B, there will be 121 jobs for HTCondor to run. Next, specify the disk space and memory to be used. For this example, keep it at the default values. See the Usage Instructions for more information about when to modify these values. Next select flock and glide if you want to expand the number of machines which your jobs may be ran on (keep them selected here). Click Submit.

Click Download Launch Files. Click submit and a .zip file should be generated.

Copying files to the SQUID proxy and submit node

Now you must upload all the files you need to HTCondor. The files are stored in the Examples folder. They must be copied to the SQUID proxy server before the dag file is ran. Usually, you must first tar the files. For example, if your shell script and executable live in the bin/ folder, then tar them by using the command `tar -cvzf MatFib.tar.gz`. However, we have done this for you.

All you must now do is copy the MatFib.tar.gz, dummy.json, libXmu_libXt.el6.x86_64.tgz, and FibonacciBatch.zip files to HTCondor. To do this, on your local machine type:

scp squall/Examples/Binaries/MatFib.tar.gz squall/Examples/data/dummy.json PATH/libXmu_libXt.el6.x86_64.tgz username@submit-5.chtc.wisc.edu:/squid/username/

This will copy the necessary files to the squid proxy server.

scp Downloads/FibonacciBatch.zip username@submit-5.chtc.wisc.edu:/home/username/

This will copy the .zip file to your submit node. Since the CHTC restrict access to their Submit Nodes, you must be on campus wifi for this to work. If you aren't you may first ssh into a Wisc computer and then scp to HTCondor.

Submitting jobs to HTCondor

Ssh into your HTCondor submit directory with the command `ssh username@submit-5.chtc.wisc.edu` and enter your password when prompted. You should see your zip file in this directory. If you don't go back to the previous section and make sure you copied your file correctly. Now unzip your file with the command `unzip filename.zip`. Then type `cd filename`. You will see a folder for each job and also a few files including 'sweep.dag'. This is the file that you run. Do this by typing `condor_submit_dag sweep.dag`. It will start running your jobs. You can check progress with `condor_q username`.

Submitting results file to Squall database

Once condor has finished its analysis it will produce a 'results.json' file in the 'results' folder. This is what you want to give back to the squall database. Do this by again doing a secure copy. For example, if you are back on your local machine type: 'scp username@submit-5.chtc.wisc.edu path/to/results.json /home/user/Desktop/'. Now navigate back to Squall and go to the experiments page. Click 'upload results'. Submit the results.json file.

Explore your data!

Now that your results are stored in a centralized location, they can be accessed by writing queries with any tool capable of interacting with MongoDB (including the terminal!). This opens up a plethora of options for further data analysis, exploration, and visualization. As a quick, easy, and highly effective option, we recommend using [Jupyter Notebooks](http://jupyter.org/). These notebooks provide a browser based platform for creating displays with intermixed markdown, LaTeX, graphics, and code (written in Python, Julia, R, MATLAB, ect.). As such, they provide an excellent platform for both performing and sharing analyses once [Squall](https://github.com/ikinsella/squall/wiki#what-is-squall) has done its job. To conclude this tutorial, we have provided an [example analysis notebook](https://github.com/ikinsella/squall/blob/master/Examples/Analysis/FibAnalysis.ipynb) which will walk you through the process of querying MongoDB and using some standard Python data analysis tools to visualize your results.

Clone this wiki locally