π A modular, production-grade ML pipeline for phishing detection β powered by FastAPI, DVC, Optuna, MLflow, Celery, and Docker. Designed with cloud-native architecture, YAML-based configuration, and reusable components.
- β End-to-end ML pipeline: Ingestion β Validation β Transformation β Training β Evaluation β Deployment
- β YAML-driven configuration system
- β Optuna hyperparameter tuning with MLflow tracking
- β Real-time FastAPI inference + Celery async training
- β AWS S3 model upload + GitHub Actions CI/CD
- β DVC for dataset versioning
networksecurity/
βββ app.py # FastAPI application
βββ main.py # Manual training pipeline trigger
βββ Dockerfile # Container build instructions
βββ docker-compose.yaml # Multi-container stack (FastAPI, Redis, Celery)
βββ config/ # YAML configs: schema, params, etc.
βββ data/ # DVC-tracked dataset (raw, transformed, validated)
βββ artifacts/ # Timestamped artifacts per pipeline run
βββ final_model/ # Final production model
βββ logs/ # Pipeline run logs
βββ templates/ # Jinja2 templates for UI
βββ requirements.txt # Python dependencies
βββ src/networksecurity/ # Source package
βββ components/ # Core pipeline stages
βββ config/ # Config manager
βββ constants/ # Path constants
βββ data_processors/ # Encoders, scalers, imputers
βββ dbhandler/ # MongoDB + S3 interfaces
βββ entity/ # Dataclass definitions
βββ exception/ # Custom error handling
βββ inference/ # Prediction logic
βββ logging/ # Centralized logger
βββ pipeline/ # Pipeline orchestration modules
βββ utils/ # Helpers (save/load/transform)
βββ worker/ # Celery worker entrypoint
MongoDB β Data Ingestion β Validation β Transformation β Training β Evaluation β Push to S3
Each stage outputs artifacts, logs, and metrics using a standardized structure.
Project is fully parameterized via YAML configs and .env secrets.
YAML Configs:
config.yaml: Paths, filenames, artifact rootsparams.yaml: Tuning ranges, preprocessing methodsschema.yaml: Column dtypes and targettemplates.yaml: Templates for YAML-based reports
Environment Variables (.env):
# MongoDB
MONGODB_URI_BASE=
MONGODB_USERNAME=
MONGODB_PASSWORD=
# MLflow/DagsHub
MLFLOW_TRACKING_URI=
MLFLOW_TRACKING_USERNAME=
MLFLOW_TRACKING_PASSWORD=
DAGSHUB_REPO_NAME=
DAGSHUB_REPO_OWNER=
# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
AWS_ECR_LOGIN_URI=
ECR_REPOSITORY_NAME=
uvicorn app:app --reloaddocker compose up --build- Create
.envand push to instance - Add this user data script when launching EC2:
#!/bin/bash
set -e
export DEBIAN_FRONTEND=noninteractive
# === 1. Update system and install base packages ===
apt-get update -y && apt-get upgrade -y
apt-get install -y git curl nginx openssl ufw
# === 1.1 Install Docker ===
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
# === 1.2 Add ubuntu user to docker group ===
usermod -aG docker ubuntu
newgrp docker
# === 2. Enable UFW and open required ports ===
ufw allow OpenSSH
ufw allow 80
ufw allow 443
ufw --force enable
# === 3. Generate self-signed SSL cert for Nginx ===
mkdir -p /etc/ssl/self-signed
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600" \
--silent)
CN=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/public-ipv4 \
--silent)
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/self-signed/self.key \
-out /etc/ssl/self-signed/self.crt \
-subj "/C=UK/ST=Scotland/L=Glasgow/O=Self/OU=Dev/CN=$CN"
# === 4. Configure Nginx ===
cat <<EOF > /etc/nginx/sites-available/fastapi
server {
listen 443 ssl;
server_name _;
ssl_certificate /etc/ssl/self-signed/self.crt;
ssl_certificate_key /etc/ssl/self-signed/self.key;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host \$host;
proxy_set_header X-Real-IP \$remote_addr;
}
}
server {
listen 80;
return 301 https://\$host\$request_uri;
}
EOF
ln -sf /etc/nginx/sites-available/fastapi /etc/nginx/sites-enabled/
rm -f /etc/nginx/sites-enabled/default
nginx -t
systemctl reload nginx
systemctl enable nginx
# === 5. GitHub Actions runner ===
mkdir -p /home/ubuntu/actions-runner
cd /home/ubuntu/actions-runner
curl -o actions-runner-linux-x64-2.324.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.324.0/actions-runner-linux-x64-2.324.0.tar.gz
echo "e8e24a3477da17040b4d6fa6d34c6ecb9a2879e800aa532518ec21e49e21d7b4 actions-runner-linux-x64-2.324.0.tar.gz" | shasum -a 256 -c
tar xzf ./actions-runner-linux-x64-2.324.0.tar.gz
chown -R ubuntu:ubuntu /home/ubuntu/actions-runner
# Configure runner
sudo -u ubuntu ./config.sh --url <your_repo_here> \
--token <your_token_here> \
--unattended \
--name self-hosted \
--labels self-hosted,linux,x64 \
--work _work
# Register runner as service
sudo ./svc.sh install
sudo ./svc.sh startThen access the app at: https://<your-ec2-ip>
- Experiment:
NetworkSecurityExperiment - Registry:
NetworkSecurityModel - Metrics: accuracy, f1, precision, recall
mlflow uiAccess: http://localhost:5000
POST /trainβ triggers training via CeleryPOST /predictβ accepts CSV or input JSON
This project is licensed under GPLv3.
Gokul Krishna N V Machine Learning Engineer β UK π¬π§ GitHub β’ LinkedIn
- Project structure: Inspired by industry ML standards
- Based on data hosted by Krishnaik06βs GitHub