This repository contains all the materials, scripts, and documentation for Laboratory 1 of the AI and Cybersecurity course.
This lab builds a complete intrusion detection pipeline on a curated subset of the CICIDS2017 dataset using Feed Forward Neural Networks (FFNN) in PyTorch.
The laboratory is structured into six progressive tasks that comprehensively cover the intrusion detection pipeline:
- Task 1: Data cleaning, stratified splits, outlier inspection, scaling comparison (Standard vs Robust).
- Task 2: Shallow FFNN (single hidden layer) with neuron sweep and activation (Linear vs ReLU).
- Task 3: Feature bias analysis (Destination Port), port substitution experiment, feature removal impact.
- Task 4: Class imbalance mitigation via class‐weighted CrossEntropy.
- Task 5: Deep architectures, batch size impact, optimizer comparison (SGD / Momentum / AdamW).
- Task 6: Overfitting and regularization (Dropout, BatchNorm, Weight Decay) on deeper models.
Laboratory1/
├── lab/ # Data, notebooks and support material
├── report/ # LaTeX source files for the lab report
├── resources/ # Additional resources (e.g., links, PDFs, images)
└── README.md # This file
Note
The detailed lab report, including all experimental results and analysis, can be found here.
- Understand preprocessing choices (scaling, outlier retention).
- Evaluate architectural depth vs minority class detection.
- Quantify bias induced by a single feature (Destination Port).
- Mitigate class imbalance using weighted loss.
- Compare optimizers and batch sizes for convergence/generalization.
- Assess regularization techniques on tabular intrusion data.
- Python 3.10+
- PyTorch, scikit-learn, numpy, pandas, matplotlib, seaborn
- Dataset file:
lab/data/dataset_lab_1.csv
-
Clone:
git clone <repo_url> cd Laboratory1 -
Create environment (example with venv):
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt(Create
requirements.txtif missing; minimal list: torch torchvision torchaudio scikit-learn pandas numpy seaborn matplotlib) -
Run notebook:
jupyter notebook lab/notebooks/Lab1_FFNN.ipynb -
Results (plots, metrics) saved under
lab/results/images/<task>_plots/.
Place dataset_lab_1.csv in lab/data/.
No automatic download is performed (course-provided subset).
- Set random seed (already fixed to 42 in notebook).
- To switch scaler: change
X_train_use = X_train_stdto robust variant. - To rerun port bias test: execute Task 3 cells after initial training.
- Best shallow (ReLU, 64 neurons) balanced macro F1.
- Deep 3-layer [32,16,8] + AdamW gave strong trade-off.
- Weight decay (1e-4) sufficed; heavy Dropout/BatchNorm harmed minority recall.
- Port feature induced spurious correlation—removal reduced PortScan shortcuts.
| Name | GitHub | ||
|---|---|---|---|
| Andrea Botticella | |||
| Elia Innocenti | |||
| Simone Romano |
