log-classifier

Scripts:

train.py

trains on a set of training logs using various algorithms
saves training models as joblib pickle files
predicts accuracy of the training models
takes the following parameters:
--train_data_dir : sets the location of the training logs (default: data/train/laptop)
--test_data_dir : sets the location of the testing logs (default: data/test/laptop)
--save-dir : set location where the joblib pickle files are saved to (default: save)

Install libraries

Make sure you have a recent version of python2.7 and python pip, then install the required libraries.

pip install numpy sklearn

Collect logs

Create data directories.

mkdir -p data/{train,test}/laptop

Create save directory

mkdir -p save

Collect logs

find /var/log -type f -size +10k -name "*.log" 2>/dev/null | while read log
do
  rows=$(wc -l "$log" | awk '{ print $1 }')
  head -$(($rows - ($rows / 10))) "$log" > data/train/laptop/"${log##*/}"
  tail -$(($rows / 10)) "$log" > data/test/laptop/"${log##*/}"
done

Run script

Run the script

python2.7 train.py

This should give something like the following:

Training log collection => 250587 data entries
Testing log collection => 27843 data entries

SGDClassifier
Success rate: 97.38%


MultinomialNB
Success rate: 98.64%


BernoulliNB
Success rate: 96.36%


DecisionTreeClassifier
Success rate: 95.26%


ExtraTreeClassifier
Success rate: 94.52%


ExtraTreesClassifier
Success rate: 99.21%


LinearSVC
Success rate: 99.17%


NearestCentroid
Success rate: 92.29%


RandomForestClassifier
Success rate: 99.06%


RidgeClassifier
Success rate: 99.16%

predict.py

loads training models from joblib pickle files
predicts accuracy of the training models
takes the following parameters:
--test_data_dir : sets the location of the testing logs (default: data/test/laptop)
--save-dir : set location where the joblib pickle files are saved to (default: save)

$ python2.7 predict.py
Testing log collection => 27843 data entries

SGDClassifier
Success rate: 97.38%


MultinomialNB
Success rate: 98.64%


BernoulliNB
Success rate: 96.36%


DecisionTreeClassifier
Success rate: 95.26%


ExtraTreeClassifier
Success rate: 94.52%


ExtraTreesClassifier
Success rate: 99.21%


LinearSVC
Success rate: 99.17%


NearestCentroid
Success rate: 92.29%


RandomForestClassifier
Success rate: 99.06%


RidgeClassifier
Success rate: 99.16%

Algorithms

Adjust the algorithms array to include any number of Scikit Learn algorithms that you want to run:

algorithms = [
#    svm.SVC(kernel='linear', C = 1.0),   # QUITE SLOW
    linear_model.SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, random_state=42, max_iter=5, tol=None),
    naive_bayes.MultinomialNB(),
    naive_bayes.BernoulliNB(),
    tree.DecisionTreeClassifier(max_depth=1000),
    tree.ExtraTreeClassifier(),
    ensemble.ExtraTreesClassifier(),
    svm.LinearSVC(),
#    linear_model.LogisticRegressionCV(multi_class='multinomial'),   # A BIT SLOW
#    neural_network.MLPClassifier(),   # VERY SLOW
    neighbors.NearestCentroid(),
    ensemble.RandomForestClassifier(),
    linear_model.RidgeClassifier(),
]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

log-classifier

Scripts:

Install libraries

Collect logs

Run script

Algorithms

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

opencredo/log-classifier

Folders and files

Latest commit

History

Repository files navigation

log-classifier

Scripts:

Install libraries

Collect logs

Run script

Algorithms

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages