AI/ML for Unsupervised IoT Intrusion Detection

This repository contains the code, processed data, and final report for a project that conducts a comparative analysis of unsupervised machine learning models for detecting zero-day threats in Internet of Things (IoT) network traffic.

Project Overview

The objective of this project was to simulate a real-world security challenge: identifying malicious network activity without relying on pre-existing attack signatures. To achieve this, I evaluated three distinct families of unsupervised anomaly detection algorithms:

Isolation Forest (Ensemble-based)
One-Class SVM (Boundary-based)
DBSCAN (Density-based)

These models were trained exclusively on benign network data from the CIC-IoT-2023 dataset to test their ability to flag never-before-seen attacks. A supervised Logistic Regression model was also implemented to serve as a performance benchmark.

This project demonstrates a complete machine learning workflow, from the ingestion and processing of a large-scale (16GB) dataset to model training, evaluation, and in-depth analysis of the results.

Key Findings

High Recall, High Alert Fatigue: Both Isolation Forest and One-Class SVM demonstrated exceptionally high recall, successfully identifying over 99% of malicious attacks. However, this sensitivity came with a high rate of false positives, which would be impractical in a real-world SOC environment.
DBSCAN Performance Failure: The density-based DBSCAN model failed catastrophically, missing over 99% of anomalies. This indicates that the underlying assumption of attack traffic being "sparse noise" does not apply to this dataset.
Isolation Forest as the Viable Candidate: Due to its high detection rate and superior computational efficiency, Isolation Forest emerged as the most practical unsupervised model among the three tested.

Repository Contents

main.py: The main entry point to run the entire analysis pipeline from start to finish.
src/: A folder containing the core logic for the project.
- data_processor.py: Script for loading, preprocessing, and scaling the data.
- model_trainer.py: Script containing functions to train and evaluate all machine learning models.
processed-data/: Contains the pre-sampled and processed datasets used for the analysis.
reports/: Contains the final academic report and saved figures.
- Report-AI-IoT-Intrusion-Detection.pdf: The detailed project report.
- figures/: Directory where confusion matrix plots are automatically saved.
requirements.txt: A list of all Python libraries required to run the project.

Dataset

The processed datasets used for this analysis are too large to be hosted directly in this Git repository. The fulll dataset can be downloaded from CIC-IoT-2023 dataset

Instructions for Reproducibility

To replicate the analysis and results, please follow these steps:

Clone the Repository

Set up the Python Environment: It is highly recommended to use a virtual environment.

# Create and activate the virtual environment
python -m venv venv
source venv/bin/activate  # On macOS/Linux
.\venv\Scripts\activate   # On Windows

# Install all required libraries
pip install -r requirements.txt

Run the Analysis: Execute the main script from the terminal. This will load the data, preprocess it, train all four models, and print their evaluation reports to the console, saving the confusion matrices to the reports/figures/ directory.
```
python main.py
```
Please Note: Training the OCSVM and DBSCAN models may take several minutes to complete, depending on your system's hardware.

The scv files are not included. They are too large for this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
reports		reports
src		src
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI/ML for Unsupervised IoT Intrusion Detection

Project Overview

Key Findings

Repository Contents

Dataset

Instructions for Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI/ML for Unsupervised IoT Intrusion Detection

Project Overview

Key Findings

Repository Contents

Dataset

Instructions for Reproducibility

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages