STAD

A performance analysis tool for parallel programs that considers both spatial and temporal patterns within trace data.

Requirements

Python              3.8.19
cuda                11.8
gensim              4.3.3
numpy               1.24.1
pandas              2.0.3
torch               2.1.0+cu118
torch_geometric     2.5.3
SimTrace

Install SimTrace

Install and SimTrace follow the README of SimTrace.

SimTrace link: https://doi.org/10.5281/zenodo.14989855

Time Slice Generation

For usage, refer to the script generate_graph

The generated data structure is as follows:

MPI_profile
└── lammps_128_abnormal   Program Name
    └── 100ms_closed      Duration
        ├── graph         Node Information
        └── graph_edge    Edge Information

Preprocess

python preprocess.py -p lammps_128_abnormal -d 100ms_closed -n 128

After execution, the vectorized node information will be saved as node_feature.csv.

MPI_profile
└── lammps_128_abnormal         Program Name
    └── 100ms_closed            Duration
        ├── graph               Node Information
        ├── graph_edge          Edge Information
        └── node_feature.csv    Vectorized Node Information

The input paths are hardcoded in preprocess.py, so remember to modify them.

For more details on the parameters, please refer to preprocess.py.

Train

# If first execution 
mkdir checkpoints

python train.py -p lammps_128_abnormal -d 100ms_closed -n 128 -b 128

The trained model will be saved in checkpoints.

The input paths are hardcoded in train.py, so remember to modify them.

For more details on the parameters, please refer to train.py.

Predict

# If first execution 
mkdir results
mkdir results/scores
mkdir results/heatmaps

python predict.py -p lammps_128_normal -d 100ms_closed -n 128 -b 128

The computed anomaly scores will be saved in ./results/scores/.

The original heatmap will be saved in ./results/heatmaps/.

The input paths are hardcoded in predict.py, so remember to modify them.

For more details on the parameters, please refer to predict.py.

Analyze

# If first execution 
mkdir results/heatmaps_filter
mkdir results/backtrace

python analyze.py -p lammps_128_normal -d 100ms_closed -n 128 -b 128

The anomaly scores and the abnormal indices after filtering based on a threshold will be saved in ./results/heatmaps_filter/ and ./results/backtrace/.

For more details on the parameters, please refer to analyze.py.

Then generate backtrace at abnormal indices with SimTrace refer to the script filter_backtrace, the backtrace will be saved in ./results/backtrace/.

Root Cause Aggregation

python3 tree.py ./results/backtrace 2

And then the anomaly-aggregated call tree is displayed.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
algorithm_utils.py		algorithm_utils.py
analyze.py		analyze.py
endogenous.py		endogenous.py
graphlstm.py		graphlstm.py
graphlstm_vae.py		graphlstm_vae.py
graphlstm_vae_ad.py		graphlstm_vae_ad.py
predict.py		predict.py
preprocess.py		preprocess.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAD

Requirements

Install SimTrace

Time Slice Generation

Preprocess

Train

Predict

Analyze

Root Cause Aggregation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Synlvejo/STAD

Folders and files

Latest commit

History

Repository files navigation

STAD

Requirements

Install SimTrace

Time Slice Generation

Preprocess

Train

Predict

Analyze

Root Cause Aggregation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages