This code uses the Exa.TrkX-HSF pipeline as a baseline. It uses the traintrack library to run different stages (Processing, DNN/GNN, and Segmenting) of the STT pipeline. This pipeline is intended for the Straw Tube Tracker (STT) of the PANDA experiment which is part of the Central Tracking System (CTS) located in the Target Spectrometer of the PANDA experiment.
Once a conda environment is successfully created (see envs/README.md for building a conda environment), one can run the pipeline from the root directory as follows:
# running pipeline
conda activate exatrkx-cpu
export EXATRKX_DATA=path/to/dataset
traintrack configs/pipeline_quickstart.yamlFollow instructions on NERSC Documentation or see the concise and essential version in NERSC.md to run pipeline on the Cori cluster at NERSC.
The deep learning pipeline consists of several stages: Processing, Graph Construction, Edge Labelling, and Graph Segmentation. The pipeline assumes that the input data is in CSV format similar to the TrackML data format (See https://www.kaggle.com/c/trackml-particle-identification).
-
Data Processing stage performs data processing on the comma-separated values (CSV) files that contain raw events from the PandaRoot simulation, and store processed data as PyTorch Geometric
Dataobject. In this stage, new quantities are derived e.g.$r, \phi, p_t, d_0$ , etc. At the moment, one can't run within a CUDA enabled envrionment, due tomultiprocessingpython library, one needs to run it in CPU-only envrionment. -
Graph Construction stage will construct graphs either using a Heuristic Method or by using Metric Learning or Embedding. At the moment, this stage is not supported instead the graph construction using a Heuristic Method is merged with the Processing stage. Since this stage is not yet supported, one needs to distribute data into
train,valandtestfolders by hand as Edge Labelling (GNN/DNN) stage assumes data distributed in these folders [Maybe in future this will change]. -
Edge Labelling stage will finish with
GNNBuildercallback, storing theedge_scorefor all events. One can re-run this step by using e.g.traintrack --inference configs/pipeline_quickstart.yamlbut one needs to putresume_idin thepipeline_quickstart. -
Graph Segmentation stage is meant for track building using DBSCAN or CCL. However, one may skip this stage altogether and move to
eval/folder where one can perform segmenting as well as track evaluation. This is due to post analysis needs, as one may need to run segmenting together with evaluation using different settings. At the moment, it is recommended to skip this stage and directly move toeval/directory (seeeval/README.mdfor more details).
The stttrkx repo contains several subdirectories containing code for specific tasks. The detail of these subdirectories is as follows:
configs/contains top-level pipeline configuration files fortraintrackeda/contains notebooks for exploratory data analysis to understand raw data.envs/contains files for building a conda environmenteval/contains code for track evalution, however, it also contain code for running segmenting stage independently oftraintrackLightningModules/contains code for each stage of the pipelinesrc/contains helper code for utility functions, plotting, event building, etcRayTune/contains helper code for running hyperparameter tuning using Ray Tune library
Several notebooks are avaialble to inpect output of each stage as well as for post analysis not necessarily intended to run the stages interactively. For example,
stt1_proc.ipynbinspects the output ofProcessingstagestt2_gnn_train.ipynbandstt3_gnn_infer.ipynbinspects the output ofGNNstage- etc.