Skip to content

Production grade phishing detection ML pipeline using FastAPI, DVC, MLflow, Optuna, Celery, and AWS.

License

Notifications You must be signed in to change notification settings

megokul/networksecurity_ml_api

Repository files navigation

πŸ›‘οΈ NetworkSecurity: Phishing Detection ML Pipeline

πŸš€ A modular, production-grade ML pipeline for phishing detection β€” powered by FastAPI, DVC, Optuna, MLflow, Celery, and Docker. Designed with cloud-native architecture, YAML-based configuration, and reusable components.


Python FastAPI MLflow DVC Optuna MongoDB AWS Docker Celery Redis Scikit-learn Pandas NumPy


βœ… Features

  • βœ… End-to-end ML pipeline: Ingestion ➜ Validation ➜ Transformation ➜ Training ➜ Evaluation ➜ Deployment
  • βœ… YAML-driven configuration system
  • βœ… Optuna hyperparameter tuning with MLflow tracking
  • βœ… Real-time FastAPI inference + Celery async training
  • βœ… AWS S3 model upload + GitHub Actions CI/CD
  • βœ… DVC for dataset versioning

πŸ“‚ Project Structure

networksecurity/
β”œβ”€β”€ app.py                   # FastAPI application
β”œβ”€β”€ main.py                  # Manual training pipeline trigger
β”œβ”€β”€ Dockerfile               # Container build instructions
β”œβ”€β”€ docker-compose.yaml      # Multi-container stack (FastAPI, Redis, Celery)
β”œβ”€β”€ config/                  # YAML configs: schema, params, etc.
β”œβ”€β”€ data/                    # DVC-tracked dataset (raw, transformed, validated)
β”œβ”€β”€ artifacts/               # Timestamped artifacts per pipeline run
β”œβ”€β”€ final_model/             # Final production model
β”œβ”€β”€ logs/                    # Pipeline run logs
β”œβ”€β”€ templates/               # Jinja2 templates for UI
β”œβ”€β”€ requirements.txt         # Python dependencies
└── src/networksecurity/     # Source package
    β”œβ”€β”€ components/          # Core pipeline stages
    β”œβ”€β”€ config/              # Config manager
    β”œβ”€β”€ constants/           # Path constants
    β”œβ”€β”€ data_processors/     # Encoders, scalers, imputers
    β”œβ”€β”€ dbhandler/           # MongoDB + S3 interfaces
    β”œβ”€β”€ entity/              # Dataclass definitions
    β”œβ”€β”€ exception/           # Custom error handling
    β”œβ”€β”€ inference/           # Prediction logic
    β”œβ”€β”€ logging/             # Centralized logger
    β”œβ”€β”€ pipeline/            # Pipeline orchestration modules
    β”œβ”€β”€ utils/               # Helpers (save/load/transform)
    └── worker/              # Celery worker entrypoint

πŸ” Pipeline Flow

MongoDB β†’ Data Ingestion β†’ Validation β†’ Transformation β†’ Training β†’ Evaluation β†’ Push to S3

Each stage outputs artifacts, logs, and metrics using a standardized structure.


βš™οΈ Configuration

Project is fully parameterized via YAML configs and .env secrets.

YAML Configs:

  • config.yaml: Paths, filenames, artifact roots
  • params.yaml: Tuning ranges, preprocessing methods
  • schema.yaml: Column dtypes and target
  • templates.yaml: Templates for YAML-based reports

Environment Variables (.env):

# MongoDB
MONGODB_URI_BASE=
MONGODB_USERNAME=
MONGODB_PASSWORD=

# MLflow/DagsHub
MLFLOW_TRACKING_URI=
MLFLOW_TRACKING_USERNAME=
MLFLOW_TRACKING_PASSWORD=
DAGSHUB_REPO_NAME=
DAGSHUB_REPO_OWNER=

# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
AWS_ECR_LOGIN_URI=
ECR_REPOSITORY_NAME=

πŸ§ͺ How to Run

βš™οΈ Local (No Docker)

uvicorn app:app --reload

🐳 Local (With Docker Compose)

docker compose up --build

☁️ On EC2 (with Nginx + GitHub Runner)

  1. Create .env and push to instance
  2. Add this user data script when launching EC2:
#!/bin/bash

set -e
export DEBIAN_FRONTEND=noninteractive

# === 1. Update system and install base packages ===
apt-get update -y && apt-get upgrade -y
apt-get install -y git curl nginx openssl ufw

# === 1.1 Install Docker ===
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

# === 1.2 Add ubuntu user to docker group ===
usermod -aG docker ubuntu
newgrp docker

# === 2. Enable UFW and open required ports ===
ufw allow OpenSSH
ufw allow 80
ufw allow 443
ufw --force enable

# === 3. Generate self-signed SSL cert for Nginx ===
mkdir -p /etc/ssl/self-signed
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" \
  --silent)
CN=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/public-ipv4 \
  --silent)
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /etc/ssl/self-signed/self.key \
  -out /etc/ssl/self-signed/self.crt \
  -subj "/C=UK/ST=Scotland/L=Glasgow/O=Self/OU=Dev/CN=$CN"

# === 4. Configure Nginx ===
cat <<EOF > /etc/nginx/sites-available/fastapi
server {
    listen 443 ssl;
    server_name _;

    ssl_certificate /etc/ssl/self-signed/self.crt;
    ssl_certificate_key /etc/ssl/self-signed/self.key;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
    }
}

server {
    listen 80;
    return 301 https://\$host\$request_uri;
}
EOF

ln -sf /etc/nginx/sites-available/fastapi /etc/nginx/sites-enabled/
rm -f /etc/nginx/sites-enabled/default
nginx -t
systemctl reload nginx
systemctl enable nginx

# === 5. GitHub Actions runner ===
mkdir -p /home/ubuntu/actions-runner
cd /home/ubuntu/actions-runner

curl -o actions-runner-linux-x64-2.324.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.324.0/actions-runner-linux-x64-2.324.0.tar.gz
echo "e8e24a3477da17040b4d6fa6d34c6ecb9a2879e800aa532518ec21e49e21d7b4  actions-runner-linux-x64-2.324.0.tar.gz" | shasum -a 256 -c
tar xzf ./actions-runner-linux-x64-2.324.0.tar.gz
chown -R ubuntu:ubuntu /home/ubuntu/actions-runner

# Configure runner
sudo -u ubuntu ./config.sh --url <your_repo_here> \
                           --token <your_token_here> \
                           --unattended \
                           --name self-hosted \
                           --labels self-hosted,linux,x64 \
                           --work _work

# Register runner as service
sudo ./svc.sh install
sudo ./svc.sh start

Then access the app at: https://<your-ec2-ip>


πŸ“ˆ MLflow Tracking

  • Experiment: NetworkSecurityExperiment
  • Registry: NetworkSecurityModel
  • Metrics: accuracy, f1, precision, recall
mlflow ui

Access: http://localhost:5000


πŸ§ͺ FastAPI Endpoints

  • POST /train β†’ triggers training via Celery
  • POST /predict β†’ accepts CSV or input JSON

πŸ” Licensing

This project is licensed under GPLv3.


πŸ‘¨β€πŸ’» Author

Gokul Krishna N V Machine Learning Engineer β€” UK πŸ‡¬πŸ‡§ GitHub β€’ LinkedIn


πŸ™Œ Acknowledgements

About

Production grade phishing detection ML pipeline using FastAPI, DVC, MLflow, Optuna, Celery, and AWS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published