🛡️ Network Traffic Anomaly Detection

This project provides a comprehensive pipeline to detect anomalies in network traffic using machine learning classifiers. It includes a Flask API for live predictions and multiple algorithmic experiments ranging from Random Forest, Ranger, to other classifiers. The project also includes some successful evasion experiments as a grey-box, white-box, and black-box.

The goal is to detect cyber-attacks (e.g., DoS, Exploits, Reconnaissance, etc.) or normal behavior based on network traffic features.

Contributors

Zeeshan Mohammad (HDGLIT)
Abed Al Hadi Ali (YTUK16)
Bailey Kelen (M1B10W)
Csimma Viktor (MIDFFJ)

Project Structure

network_anomaly_detection/
│
├── RF_Classifier_with_SMOTE/     # Flask API using RandomForestClassifier
├── ranger/                       # Experiments with Ranger (efficient RF variant)
├── other_algos/                  # SVM, XGBoost, Logistic Regression, etc.
├── blackbox/                     # Black-box attacks
├── greybox/                      # Grey-box attacks
├── whitebox/                     # White-box attacks
├── training_data/                # Raw/preprocessed datasets
├── netflow_evasion_report.pdf    # Report explaining methodology and results
├── netflow_evasion_presentation.pdf # Presentation slides
└── README.md

Approach Overview

Step-by-Step Model Exploration

RF_Classifier_with_SMOTE/
- Trained a RandomForestClassifier using SMOTE to handle class imbalance.
- Deployed using a Flask API to serve real-time predictions.
- pkl file is included
ranger/
- Ranger (high-performance Random Forest implementation via R or Python).
- Focus on efficiency and tuning in large-scale or low-latency applications.
other_algos/
- Additional classifiers tested:
  - XGBoost
  - SVM
  - Logistic Regression
  - Decision Trees
- Used for benchmarking and performance comparison.

Getting Started

Prerequisites

Python 3.6 or higher
pip (Python package installer)

Installation

Clone the repository

git clone https://github.com/m-zest/network_anomaly_detection.git
cd network_anomaly_detection

Go into the API directory:
```
cd RF_Classifier_with_SMOTE
```

(Optional) Create a virtual environment

python3 -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Ensure model file (model.pkl) is present in this directory.

Running the API

python app.py

The Flask app will run on: http://127.0.0.1:5000/

API Endpoint

`POST /predict`

Send a JSON payload representing a network traffic sample.

Example Request

{
  "sport": 443,
  "dport": 80,
  "proto": 6,
  "state": 1,
  "dur": 0.2,
  "sbytes": 1300,
  "spkts": 5
}

Example Response

{
  "prediction": 
}

Class 1 corresponds to a label 'Exploits'.

Testing the API

With Postman:

Method: POST
URL: http://127.0.0.1:5000/predict
Body → raw → JSON

{
  "sport": 1000,
  "dport": 80,
  "proto": 6,
  "state": 1,
  "dur": 0.12,
  "sbytes": 1200,
  "spkts": 3
}

With curl:

curl -X POST -H "Content-Type: application/json" \
-d '{"sport":1000,"dport":80,"proto":6,"state":1,"dur":0.12,"sbytes":1200,"spkts":3}' \
http://127.0.0.1:5000/predict

ML attacks demonstrated on the Ranger library

Ranger is a fast implementation of random forests. It is written in the C++ language, but also has an R frontend. Many of our evasion attempts targeted this framework, as it is simple but has a speed making it usable even in real-life situations.

Prerequisites

R is needed; RStudio is not necessary but recommended as it makes running R code much easier.

Afterwards, to install the ranger R package from CRAN, just run install.packages("ranger").

Data source

The original data source is a 700 000 record-long excerpt of the UNSW-NB15_1.csv data set, in which we have changed all the textual labels to a simple 0 (normal record) or 1 (attack or other anomaly). See binarise.sh for the script performing the change.

Files

The easiest option to run a given file is to open it in RStudio and run it line by line.

The file functions.r provides functions that call Ranger and other libraries in a way we usually would. Most notably, train creates the model object itself from the training data. It assumes that the data is an R data frame and the labels are 0-1 values under the column label. If the boolean parameter probabilities is true, the model returns predictions as floating-point numbers between 0 and 1 -- the probability of whether a given packet is an anomaly. If it is false, the model just gives 0-1 labels. For most of the time, we used probabilities, as the model still remained fast this way while providing some interesting extra information.
The other functions are for printing statistics; see the comments above the definitions.

train_on_partial_binarised.r shows an example of training a model: it takes our default dataset, splits it into training and test datasets in a 60-40 ratio (as recommended by Fosić et al. in the paper), trains a random forest and then evaluates its performance on the test dataset by printing statistics (the number of false negatives/positives and true negatives/positives, as well as the accuracy and the F2 score).

greybox_predictions.r and blackbox_predictions.r just run the model on maliciously crafted input and print the results. For the input files, see the project root.

model_poisoning.r, however, is specific to Ranger and R. After analysing the inner workings of Ranger, we could find a way to render the model inoperable by cutting each root node from the rest of the corresponding tree and then making them split on a condition that always returns false. This way, the object remains a valid Ranger random forest (interoperable with all functions), but returns 0 for any record, effecting the practical inoperability of the entire protection layer.

Finally, magic_sport.r demonstrates the backdoor injection attack. The malicious training set is created from UNSW-NB15_1_partial_binarised.csv using magic_sport.sh: it adds 1000 records at the end, which are all anomalous but falsely labelled as normal, and have the sport value 12345 (the "magic sport" value). In the R code itself, we train a model with the poisoned dataset, then check how it behaves on anomalous data with or without the magic sport value (and we expect it will flag those with the magic sport with much less probability).

Algorithm Comparisons

Classifier	Accuracy	F1 Score	Notes
Random Forest (SMOTE)	✅ High	✅ High	Used in API
Ranger	✅ High	✅ Medium	Faster training
XGBoost	✅ High	✅ High	Best for imbalanced classes
SVM	⚠️ Slow	✅ Good	Good margin separation
Logistic Regression	⚠️ Lower	⚠️ Lower	Baseline comparison

Additional Reports

netflow_evasion_report.pdf: Detailed report on methods, results, and evasion tactics.
netflow_evasion_presentation.pdf: Slides summarizing the project.

Attack Scenarios Overview

** White-box Evasion:** In this scenario, the attacker has full access to the training data and model structure. This allows them to manipulate features and poison the dataset or model directly. Techniques included injecting mislabeled samples, introducing "magic ports" to trigger false negatives, and corrupting model logic by modifying binary files — making all malicious traffic appear normal.

** Grey-box Evasion:** Here, the attacker has limited knowledge — access to training data statistics but not the model itself. Using calculated averages (e.g., source packets on specific ports), attackers crafted or modified flows to blend with legitimate traffic patterns, successfully evading detection in several cases.

** Black-box Evasion:** With no access to model internals or training data, attackers relied on external tools like Nmap to simulate real-world reconnaissance (e.g., stealth scans). Captured traffic was converted to NetFlow format. Notably, stealthy scans often bypassed detection entirely, especially when using low-interaction techniques like SYN scans with version suppression.

⚠️ Notes

All input data should be preprocessed as per model expectations.
Class labels should be mapped externally if the output is an integer.
You can explore different modeling approaches in their respective folders.

Resources

Netflow MSc Thesis: https://github.com/kahramankostas/Anomaly-Detection-in-Networks-Using-Machine-Learning

Netflow Research Paper: https://www.sciencedirect.com/science/article/pii/S2452414X23000390

RF Ranger Tool : https://github.com/imbs-hl/ranger

Project Data: https://unsw-my.sharepoint.com/personal/z5025758_ad_unsw_edu_au/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fz5025758%5Fad%5Funsw%5Fedu%5Fau%2FDocuments%2FUNSW%2DNB15%20dataset%2FCSV%20Files

RF Algorithm: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Nmap: https://nmap.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ Network Traffic Anomaly Detection

Contributors

Project Structure

Approach Overview

Step-by-Step Model Exploration

Getting Started

Prerequisites

Installation

Running the API

API Endpoint

`POST /predict`

Example Request

Example Response

Testing the API

With Postman:

With curl:

ML attacks demonstrated on the Ranger library

Prerequisites

Data source

Files

Algorithm Comparisons

Additional Reports

Attack Scenarios Overview

⚠️ Notes

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
RF_Classifier_with_SMOTE		RF_Classifier_with_SMOTE
blackbox		blackbox
greybox		greybox
other_algos		other_algos
ranger		ranger
training_data		training_data
whitebox		whitebox
.gitignore		.gitignore
README.md		README.md
netflow_evasion_presentation.pdf		netflow_evasion_presentation.pdf
netflow_evasion_report.pdf		netflow_evasion_report.pdf

Folders and files

Latest commit

History

Repository files navigation

🛡️ Network Traffic Anomaly Detection

Contributors

Project Structure

Approach Overview

Step-by-Step Model Exploration

Getting Started

Prerequisites

Installation

Running the API

API Endpoint

POST /predict

Example Request

Example Response

Testing the API

With Postman:

With curl:

ML attacks demonstrated on the Ranger library

Prerequisites

Data source

Files

Algorithm Comparisons

Additional Reports

Attack Scenarios Overview

⚠️ Notes

Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /predict`

Packages