Skip to content

WazzyLorca/AI-IoT-Intrusion-Detection-Public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

AI/ML for Unsupervised IoT Intrusion Detection

This repository contains the code, processed data, and final report for a project that conducts a comparative analysis of unsupervised machine learning models for detecting zero-day threats in Internet of Things (IoT) network traffic.


Project Overview

The objective of this project was to simulate a real-world security challenge: identifying malicious network activity without relying on pre-existing attack signatures. To achieve this, I evaluated three distinct families of unsupervised anomaly detection algorithms:

  1. Isolation Forest (Ensemble-based)
  2. One-Class SVM (Boundary-based)
  3. DBSCAN (Density-based)

These models were trained exclusively on benign network data from the CIC-IoT-2023 dataset to test their ability to flag never-before-seen attacks. A supervised Logistic Regression model was also implemented to serve as a performance benchmark.

This project demonstrates a complete machine learning workflow, from the ingestion and processing of a large-scale (16GB) dataset to model training, evaluation, and in-depth analysis of the results.

Key Findings

  • High Recall, High Alert Fatigue: Both Isolation Forest and One-Class SVM demonstrated exceptionally high recall, successfully identifying over 99% of malicious attacks. However, this sensitivity came with a high rate of false positives, which would be impractical in a real-world SOC environment.
  • DBSCAN Performance Failure: The density-based DBSCAN model failed catastrophically, missing over 99% of anomalies. This indicates that the underlying assumption of attack traffic being "sparse noise" does not apply to this dataset.
  • Isolation Forest as the Viable Candidate: Due to its high detection rate and superior computational efficiency, Isolation Forest emerged as the most practical unsupervised model among the three tested.

Repository Contents

  • main.py: The main entry point to run the entire analysis pipeline from start to finish.
  • src/: A folder containing the core logic for the project.
    • data_processor.py: Script for loading, preprocessing, and scaling the data.
    • model_trainer.py: Script containing functions to train and evaluate all machine learning models.
  • processed-data/: Contains the pre-sampled and processed datasets used for the analysis.
  • reports/: Contains the final academic report and saved figures.
    • Report-AI-IoT-Intrusion-Detection.pdf: The detailed project report.
    • figures/: Directory where confusion matrix plots are automatically saved.
  • requirements.txt: A list of all Python libraries required to run the project.

Dataset

The processed datasets used for this analysis are too large to be hosted directly in this Git repository. The fulll dataset can be downloaded from CIC-IoT-2023 dataset

Instructions for Reproducibility

To replicate the analysis and results, please follow these steps:

  1. Clone the Repository

  2. Set up the Python Environment: It is highly recommended to use a virtual environment.

    # Create and activate the virtual environment
    python -m venv venv
    source venv/bin/activate  # On macOS/Linux
    .\venv\Scripts\activate   # On Windows
    
    # Install all required libraries
    pip install -r requirements.txt
  3. Run the Analysis: Execute the main script from the terminal. This will load the data, preprocess it, train all four models, and print their evaluation reports to the console, saving the confusion matrices to the reports/figures/ directory.

    python main.py

    Please Note: Training the OCSVM and DBSCAN models may take several minutes to complete, depending on your system's hardware.

    The scv files are not included. They are too large for this repository.

About

A comparative analysis of unsupervised machine learning models (Isolation Forest, OCSVM, DBSCAN) for zero-day threat detection in IoT network traffic using the CIC-IoT-2023 dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages