🛡️ NetworkSecurity: Phishing Detection ML Pipeline

🚀 A modular, production-grade ML pipeline for phishing detection — powered by FastAPI, DVC, Optuna, MLflow, Celery, and Docker. Designed with cloud-native architecture, YAML-based configuration, and reusable components.

✅ Features

✅ End-to-end ML pipeline: Ingestion ➜ Validation ➜ Transformation ➜ Training ➜ Evaluation ➜ Deployment
✅ YAML-driven configuration system
✅ Optuna hyperparameter tuning with MLflow tracking
✅ Real-time FastAPI inference + Celery async training
✅ AWS S3 model upload + GitHub Actions CI/CD
✅ DVC for dataset versioning

📂 Project Structure

networksecurity/
├── app.py                   # FastAPI application
├── main.py                  # Manual training pipeline trigger
├── Dockerfile               # Container build instructions
├── docker-compose.yaml      # Multi-container stack (FastAPI, Redis, Celery)
├── config/                  # YAML configs: schema, params, etc.
├── data/                    # DVC-tracked dataset (raw, transformed, validated)
├── artifacts/               # Timestamped artifacts per pipeline run
├── final_model/             # Final production model
├── logs/                    # Pipeline run logs
├── templates/               # Jinja2 templates for UI
├── requirements.txt         # Python dependencies
└── src/networksecurity/     # Source package
    ├── components/          # Core pipeline stages
    ├── config/              # Config manager
    ├── constants/           # Path constants
    ├── data_processors/     # Encoders, scalers, imputers
    ├── dbhandler/           # MongoDB + S3 interfaces
    ├── entity/              # Dataclass definitions
    ├── exception/           # Custom error handling
    ├── inference/           # Prediction logic
    ├── logging/             # Centralized logger
    ├── pipeline/            # Pipeline orchestration modules
    ├── utils/               # Helpers (save/load/transform)
    └── worker/              # Celery worker entrypoint

🔁 Pipeline Flow

MongoDB → Data Ingestion → Validation → Transformation → Training → Evaluation → Push to S3

Each stage outputs artifacts, logs, and metrics using a standardized structure.

⚙️ Configuration

Project is fully parameterized via YAML configs and .env secrets.

YAML Configs:

config.yaml: Paths, filenames, artifact roots
params.yaml: Tuning ranges, preprocessing methods
schema.yaml: Column dtypes and target
templates.yaml: Templates for YAML-based reports

Environment Variables (.env):

# MongoDB
MONGODB_URI_BASE=
MONGODB_USERNAME=
MONGODB_PASSWORD=

# MLflow/DagsHub
MLFLOW_TRACKING_URI=
MLFLOW_TRACKING_USERNAME=
MLFLOW_TRACKING_PASSWORD=
DAGSHUB_REPO_NAME=
DAGSHUB_REPO_OWNER=

# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
AWS_ECR_LOGIN_URI=
ECR_REPOSITORY_NAME=

🧪 How to Run

⚙️ Local (No Docker)

uvicorn app:app --reload

🐳 Local (With Docker Compose)

docker compose up --build

☁️ On EC2 (with Nginx + GitHub Runner)

Create .env and push to instance
Add this user data script when launching EC2:

#!/bin/bash

set -e
export DEBIAN_FRONTEND=noninteractive

# === 1. Update system and install base packages ===
apt-get update -y && apt-get upgrade -y
apt-get install -y git curl nginx openssl ufw

# === 1.1 Install Docker ===
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

# === 1.2 Add ubuntu user to docker group ===
usermod -aG docker ubuntu
newgrp docker

# === 2. Enable UFW and open required ports ===
ufw allow OpenSSH
ufw allow 80
ufw allow 443
ufw --force enable

# === 3. Generate self-signed SSL cert for Nginx ===
mkdir -p /etc/ssl/self-signed
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" \
  --silent)
CN=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/public-ipv4 \
  --silent)
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /etc/ssl/self-signed/self.key \
  -out /etc/ssl/self-signed/self.crt \
  -subj "/C=UK/ST=Scotland/L=Glasgow/O=Self/OU=Dev/CN=$CN"

# === 4. Configure Nginx ===
cat <<EOF > /etc/nginx/sites-available/fastapi
server {
    listen 443 ssl;
    server_name _;

    ssl_certificate /etc/ssl/self-signed/self.crt;
    ssl_certificate_key /etc/ssl/self-signed/self.key;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
    }
}

server {
    listen 80;
    return 301 https://\$host\$request_uri;
}
EOF

ln -sf /etc/nginx/sites-available/fastapi /etc/nginx/sites-enabled/
rm -f /etc/nginx/sites-enabled/default
nginx -t
systemctl reload nginx
systemctl enable nginx

# === 5. GitHub Actions runner ===
mkdir -p /home/ubuntu/actions-runner
cd /home/ubuntu/actions-runner

curl -o actions-runner-linux-x64-2.324.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.324.0/actions-runner-linux-x64-2.324.0.tar.gz
echo "e8e24a3477da17040b4d6fa6d34c6ecb9a2879e800aa532518ec21e49e21d7b4  actions-runner-linux-x64-2.324.0.tar.gz" | shasum -a 256 -c
tar xzf ./actions-runner-linux-x64-2.324.0.tar.gz
chown -R ubuntu:ubuntu /home/ubuntu/actions-runner

# Configure runner
sudo -u ubuntu ./config.sh --url <your_repo_here> \
                           --token <your_token_here> \
                           --unattended \
                           --name self-hosted \
                           --labels self-hosted,linux,x64 \
                           --work _work

# Register runner as service
sudo ./svc.sh install
sudo ./svc.sh start

Then access the app at: https://<your-ec2-ip>

📈 MLflow Tracking

Experiment: NetworkSecurityExperiment
Registry: NetworkSecurityModel
Metrics: accuracy, f1, precision, recall

mlflow ui

Access: http://localhost:5000

🧪 FastAPI Endpoints

POST /train → triggers training via Celery
POST /predict → accepts CSV or input JSON

🔐 Licensing

This project is licensed under GPLv3.

👨‍💻 Author

Gokul Krishna N V Machine Learning Engineer — UK 🇬🇧 GitHub • LinkedIn

🙌 Acknowledgements

Project structure: Inspired by industry ML standards
Based on data hosted by Krishnaik06’s GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.dvc		.dvc
.github/workflows		.github/workflows
NetworkSecurity.egg-info		NetworkSecurity.egg-info
__pycache__		__pycache__
config		config
data		data
final_model		final_model
network_data/input_csv		network_data/input_csv
research		research
src/networksecurity		src/networksecurity
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ NetworkSecurity: Phishing Detection ML Pipeline

✅ Features

📂 Project Structure

🔁 Pipeline Flow

⚙️ Configuration

🧪 How to Run

⚙️ Local (No Docker)

🐳 Local (With Docker Compose)

☁️ On EC2 (with Nginx + GitHub Runner)

📈 MLflow Tracking

🧪 FastAPI Endpoints

🔐 Licensing

👨‍💻 Author

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

megokul/networksecurity_ml_api

Folders and files

Latest commit

History

Repository files navigation

🛡️ NetworkSecurity: Phishing Detection ML Pipeline

✅ Features

📂 Project Structure

🔁 Pipeline Flow

⚙️ Configuration

🧪 How to Run

⚙️ Local (No Docker)

🐳 Local (With Docker Compose)

☁️ On EC2 (with Nginx + GitHub Runner)

📈 MLflow Tracking

🧪 FastAPI Endpoints

🔐 Licensing

👨‍💻 Author

🙌 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages