An execution anoamly detection method based on variable-order network representation. The description instructs GraphLog and baseline methods and helps to reproduce the evaluation results. 3 parts were included:
Note: This repo does not include log parsing,if you need to use it, please check logparser
The preprocessed datasets are provided for evaluation. We used two dataset here:
- OpenStackLog: We collected it from OpenStack that was deployed on CouldLab, which is a testbed for research and education in cloud computing. There are 174,725 logs collected. After preprocessing, it contains 6,000 sessions as normal, 500 abnormal sessions and 36 event templates. The detail of the dataset can be found in our paper.
- HDFS: The HDFS dataset was collected running Hadoop-based jobs from more than 200 Amazon’s EC2 nodes, and labeled by Hadoop domain experts. There are 11,175,629 logs in the dataset and it parsed 558,223 normal sequences and 16,838 abnormal sequences (2.9%). The detail of the dataset can be found here.
- python>=3.6
- pytorch >= 1.1.0
git clone https://github.com/hniu1/GraphLog.git
cd GraphLog
This section shows the steps how to run GraphLog.
cd GraphLog/
# Training
python AD_log_openstacklog.py train
# Testing
python AD_log_openstacklog.py predict
'training_ratio' can be set to different value in the code for different percentage of normal data as training data.
4 baseline methods are used:
- PCA: Large-Scale System Problems Detection by Mining Console Logs
- InvariantsMiner: Mining Invariants from Console Logs for System Problem Detection
- LogCluster: Log Clustering based Problem Identification for Online Service Systems
- DeepLog: DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning
we use open-source machine learning-based log analysis toolkit for baseline methods, Loglizer and logdeep.
In PCA_OpenStackLog.py, first set training_ratio. Then,
cd baselines/
python PCA_OpenStackLog.py
In InvariantsMiner_OpenStackLog.py, first set training_ratio. Then,
cd baselines/
python InvariantsMiner_OpenStackLog.py
In LogClustering_OpenStackLog.py, first set training_ratio. Then,
cd baselines/
python LogClustering_OpenStackLog.py
In deeplog_OpenStackLog.py, first set training_ratio. Then,
cd baselines/deeplog/
# training
python deeplog_OpenStackLog.py train
# testing
python deeplog_OpenStackLog.py predict