Skip to content

Installation

Arnaud Ceol edited this page Feb 1, 2016 · 5 revisions

Requirements:

Htsflow necessitates:

  • a mysql server to run the database
  • an apache server with PHP enable to run the website
  • a linux operating system to run the pipelines. Several additional programs are needed:
  • Several linux command line applications: xargs, pigz, find

For an easy installation of mysql and apache/PHP, it is possible to use a LAMP such as https://bitnami.com/stack/lamp.

Preparation

The source code is available from https://github.com/arnaudceol/htsflow

It is divided into the following directory:

  • web: the HTS flow website.
  • pipeline: all scripts necessary to run the pipelines.
  • scripts: additional scripts used for instance to launch jobs or update the database.
  • doc: documentation.
  • conf: the configuration files.
  • installation: script to install the database and the third party binaries and libraries.

Before to start, it is necessary to choose where will be installed the different parts of HTS flow (web site, pipeline, scripts, input/output files). All directories should be readable by the user who run the web server (usually www-data). The input files should be readable by the users who will run the pipeline. The output folder should be writable by the users who will run the pipeline. Note that the directories created by the pipeline will have the 775 permission (everyone can read, the group members can write).

Which files should I configure?

  1. conf/htsflow.ini (available in the conf/ directory) is the main confirguration file. It should be edited and configured with the paths for all the input/output directories and programs used by the pipeline. It uses a structure with brackets to be able to reuse any variable. For instance if we define: HTSFLOW_HOME=/home/htsflow and HTSFLOW_INPUT = [HTSFLOW_HOME]/data/ then HTSFLOW_INPUT will be interpreted as /home/htsflow/data/.

  2. conf/db.ini: contains the configuration for the htsflow database, and (if used), the configuration for the LIMS.

  3. web/config.php: edit the variable $conf with the full path to the htsflow.ini file.

  4. pipeline/BatchJobs.R: use to launch jobs. The BatchJobs library allow to use many grid engines. See https://cran.r-project.org/web/packages/BatchJobs/ for more information, and pipeline/BatchJobsSGE.R for an example using the SUN Grid Engine or pipeline/BatchJobsLocal.R to run the jobs sequentially (no cluster).

Install the website:

You should have an Apache server with PHP installed and running. Copy the “web” directory to htdocs directory of the Apache server and rename it as you wish (for instance htsflow). Edit the file config.php and set up the variable $conf with the full path of the configuration file.

Install the database

The website and pipelines rely on a Mysql database. The access to this database is configured in a db.ini file available in the conf/ directory. The name and place for this file is parameterizable in the htsflow.conf file.

You first need to create the htsflow database:

echo “create database htsflowdb” | mysql

Then import the model and functions:

mysql htsflowdb < installation/sql/create/create_htsflowdb.sql

You can use different user, hostname and database name. In this case the database configuration file (by default conf/db.ini) should be updated accordingly.

Alternativly, you can use phpMySql (installed by default with Bitnami LAMP):

  • open http://127.0.0.1:8080/phpmyadmin/ in a browser,
  • click "new" to create a new database and call it "htsflowdb"
  • select "htsflowdb"->import, press browse and select the file installation/sql/create/create_htsflowdb.sql and press the GO button

Binaries and libraries

Genomic tools and libraries are often updated and we recommend to use versions on which HTS flow has been tested in order to ensure the stability of the pipeline and the reproducibility of the results.

For this reason, we download and build each software and library in a HTS flow folder. This is done with the script installation/software.sh. Note that the script is configured for a linux OS with a x64 configuration and should be update if the computer running the pipeline has a different architecture.

Other scripts

  • scripts/HTSflowSubmitter.R: access the database and launch the pipelines for all the analysis created by the user. This script takes as parameter the path to the configuration file.
  • scripts/CheckForUpdatesInLIMS2.py: if you are using the SMITH Lims, you can use this script to import automatically the samples into HTSflow. Both scripts can be added to a cron table in order to run automatically.

Clone this wiki locally