GraphLog: Execution Anomaly Detection for System Logs

An execution anoamly detection method based on variable-order network representation. The description instructs GraphLog and baseline methods and helps to reproduce the evaluation results. 3 parts were included:

Note: This repo does not include log parsing，if you need to use it, please check logparser

Dataset

The preprocessed datasets are provided for evaluation. We used two dataset here:

OpenStackLog: We collected it from OpenStack that was deployed on CouldLab, which is a testbed for research and education in cloud computing. There are 174,725 logs collected. After preprocessing, it contains 6,000 sessions as normal, 500 abnormal sessions and 36 event templates. The detail of the dataset can be found in our paper.
HDFS: The HDFS dataset was collected running Hadoop-based jobs from more than 200 Amazon’s EC2 nodes, and labeled by Hadoop domain experts. There are 11,175,629 logs in the dataset and it parsed 558,223 normal sequences and 16,838 abnormal sequences (2.9%). The detail of the dataset can be found here.

Requirement

python>=3.6
pytorch >= 1.1.0

Quick start

git clone https://github.com/hniu1/GraphLog.git
cd GraphLog

GraphLog

This section shows the steps how to run GraphLog.

cd GraphLog/
# Training
python AD_log_openstacklog.py train
# Testing
python AD_log_openstacklog.py predict

'training_ratio' can be set to different value in the code for different percentage of normal data as training data.

Baseline methods

4 baseline methods are used:

PCA: Large-Scale System Problems Detection by Mining Console Logs
InvariantsMiner: Mining Invariants from Console Logs for System Problem Detection
LogCluster: Log Clustering based Problem Identification for Online Service Systems
DeepLog: DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

we use open-source machine learning-based log analysis toolkit for baseline methods, Loglizer and logdeep.

PCA

In PCA_OpenStackLog.py, first set training_ratio. Then,

cd baselines/
python PCA_OpenStackLog.py

InvariantsMiner

In InvariantsMiner_OpenStackLog.py, first set training_ratio. Then,

cd baselines/
python InvariantsMiner_OpenStackLog.py

LogCluster

In LogClustering_OpenStackLog.py, first set training_ratio. Then,

cd baselines/
python LogClustering_OpenStackLog.py

DeepLog

In deeplog_OpenStackLog.py, first set training_ratio. Then,

cd baselines/deeplog/
# training
python deeplog_OpenStackLog.py train
# testing
python deeplog_OpenStackLog.py predict

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
GraphLog		GraphLog
baselines		baselines
data_preprocessed		data_preprocessed
results		results
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphLog: Execution Anomaly Detection for System Logs

Dataset

Requirement

Quick start

GraphLog

Baseline methods

PCA

InvariantsMiner

LogCluster

DeepLog

About

Uh oh!

Releases

Packages

Languages

hniu1/GraphLog

Folders and files

Latest commit

History

Repository files navigation

GraphLog: Execution Anomaly Detection for System Logs

Dataset

Requirement

Quick start

GraphLog

Baseline methods

PCA

InvariantsMiner

LogCluster

DeepLog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages