A production-style Linux infrastructure project demonstrating system administration, security hardening, and Infrastructure as Code (IaC) practices using Ansible automation.
- Project Overview
- Infrastructure Architecture
- Project Phases
- Current Progress
- Security Implementations
- Technologies Used
- Quick Start
- Command Reference
- Skills Demonstrated
- Future Enhancements
- Contributing
- License
- Author
This project showcases the complete lifecycle of building, securing, and automating a multi-server Linux environment from scratch. It demonstrates real-world DevOps and system administration practices used in production environments.
- Infrastructure as Code (IaC): Everything is reproducible and version-controlled
- Security First: Multi-layered security approach with automated hardening
- Automation: Manual tasks converted to reusable Ansible playbooks
- Service Deployment: Full 3-tier web application stack (Nginx โ Node.js โ PostgreSQL)
- Monitoring Ready: Metrics collection infrastructure with node_exporter
- Professional Documentation: Clear, comprehensive, and maintainable
- โ Build a multi-server Linux environment with proper networking
- โ Implement security best practices (SSH hardening, firewalls, intrusion prevention)
- โ Automate everything with Ansible for repeatability
- โ Deploy production-ready services (web, application, database tiers)
- โ Implement centralized monitoring and alerting
- โ Create automated backup and disaster recovery procedures
- โ Test failure scenarios and validate recovery processes
- โ Document everything for knowledge transfer
Time Investment: ~30-40 hours (6 phases)
Current Time Spent: ~30 hours
- Hypervisor: VirtualBox 7.x on Windows 11 host
- Operating System: Linux Mint 22 (based on Ubuntu 24.04 LTS)
- Network: Dual-adapter setup (NAT + Host-Only)
- Automation Platform: Ansible 2.16+
- Version Control: Git / GitHub
| Hostname | Role | NAT IP | Host-Only IP | vCPU | RAM | Disk | Status |
|---|---|---|---|---|---|---|---|
| baseline-template | Golden Image | 10.0.2.10 | 192.168.56.10 | 2 | 2GB | 25GB | ๐ด Powered Off |
| control-node | Ansible Controller | 10.0.2.11 | 192.168.56.11 | 2 | 2GB | 25GB | ๐ข Running |
| web-server | Nginx Reverse Proxy | 10.0.2.12 | 192.168.56.12 | 2 | 2GB | 25GB | ๐ข Running |
| app-server | Node.js Application | 10.0.2.13 | 192.168.56.13 | 2 | 2GB | 25GB | ๐ข Running |
| db-server | PostgreSQL Database | 10.0.2.14 | 192.168.56.14 | 2 | 4GB | 50GB | ๐ข Running |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Windows 11 Host Machine โ
โ Your Workstation (SSH Client) โ
โ 192.168.56.1 (Host-Only Gateway) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ SSH Access via Host-Only Network
โ (Management & Development)
โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โ โ โ
โโโโโโผโโโโโโ โโโโโโโโผโโโโโโโ โโโโโโโผโโโโโโโ
โ control โ โ web-server โ โ app-server โ
โ -node โโโโโโโบโ (nginx) โโโโโโบโ (node.js) โ
โ โ โ โ โ โ
โ.56.11 โ โ .56.12 โ โ .56.13 โ
โโโโโโฌโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโฌโโโโโโโ
โ โ โ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโผโโโโโโโ
โ db-server โ
โ(postgresql)โ
โ โ
โ .56.14 โ
โโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NAT Network (InfraNet - 10.0.2.0/24) โ
โ VM-to-VM Communication & Internet Access โ
โ โ
โ control-node: 10.0.2.11 web-server: 10.0.2.12 โ
โ app-server: 10.0.2.13 db-server: 10.0.2.14 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ Internet Access
โผ
โโโโโโโโโโโโโโโโโ
โ Internet โ
โ (via Windows) โ
โโโโโโโโโโโโโโโโโ
Internet โ [Web Server:80] โ [App Server:3000] โ [Database:5432]
Nginx Proxy Express.js API PostgreSQL
Security: Each tier only accepts connections from the previous tier
Objective: Build the infrastructure foundation manually to understand every component

Tasks Completed:
- VirtualBox environment setup with NAT and Host-Only networks
- Created baseline template VM with dual network adapters
- Manual security hardening (SSH, firewall, fail2ban)
- Installed monitoring agent (node_exporter)
- Configured automatic security updates
- Cloned and configured 4 production VMs
- Established hostname resolution via /etc/hosts
- Verified connectivity and services
- Created snapshot:
Phase1-Complete-Baseline
Time Invested: ~4 hours
Status: โ
100% Complete
Objective: Convert all manual configurations into automated, repeatable Ansible playbooks
Tasks Completed:
- Installed Ansible 2.16+ on control-node
- Created complete project directory structure
- Generated SSH keys for Ansible automation
- Distributed SSH keys to all managed nodes
- Configured passwordless sudo on all managed nodes
- Created ansible.cfg with optimized settings
- Created inventory file with logical host groups
- Created group_vars/all.yml with global variables
- Tested Ansible connectivity (ping module)
- Initialized Git repository
- Pushed to GitHub repository
- Created 5 security roles:
ssh_hardening- SSH security configurationfirewall- UFW firewall rulesfail2ban- Intrusion prevention systemauto_updates- Unattended security patchesnode_exporter- Prometheus metrics exporter
- Created base-hardening.yml playbook
- Successfully executed on all 3 managed nodes
- Tested and verified idempotency
- Created verify-config.yml verification playbook
- All security services running and verified
-
Web Server Deployment:
- Created
nginxrole with reverse proxy configuration - Configured security headers
- Created web-server.yml playbook
- Deployed and verified Nginx
- Opened firewall ports 80, 443
- Created
-
Application Server Deployment:
- Created
nodejs_approle - Deployed Express.js sample application
- Configured systemd service (myapp.service)
- Created app-server.yml playbook
- Service running and health checks passing
- Restricted access to web server only
- Created
-
Database Server Deployment:
- Created
postgresqlrole - Installed PostgreSQL 16
- Created application database (appdb)
- Created database user (appuser)
- Configured network access from app server
- Created db-server.yml playbook
- Database accessible and verified
- Created
- Created comprehensive verification playbook
- Tested Web โ App connectivity
- Tested App โ Database connectivity
- Verified full request flow (end-to-end)
- All services passing health checks
Deliverables Completed:
- 8 Ansible roles (reusable components)
- 6 Ansible playbooks (automation scripts)
- Complete 3-tier application stack
- Full security hardening
- Monitoring foundation
Time Invested: ~11 hours
Status: โ
100% Complete
Snapshot: Phase2-Complete-Full-Automation (ready to create)
Objective: Implement Prometheus and Grafana for infrastructure monitoring
Tasks Completed:
- Created Prometheus role (
roles/prometheus/) - Installed Prometheus 3.9.1 from GitHub releases
- Created Prometheus system user and directories
- Configured Prometheus to scrape all 4 node_exporters
- Set up systemd service for Prometheus
- Configured scraping targets:
- control-node: 10.0.2.11:9100
- web-server: 10.0.2.12:9100
- app-server: 10.0.2.13:9100
- db-server: 10.0.2.14:9100
- Configured firewall to allow port 9090 from host-only network
- Verified Prometheus is running and healthy
- Created Grafana role (
roles/grafana/) - Installed Grafana 12.3.2 from official repository
- Configured Grafana to run on port 3001
- Set default admin password: admin123!
- Configured Prometheus as default data source
- Configured firewall to allow port 3001 from host-only network
- Created provisioning for automatic data source configuration
- Verified Grafana is running and accessible
- Created monitoring dashboards via Grafana API
- Imported and tested dashboard templates
- Created working dashboards with proven queries:
- "System Monitoring Dashboard" (comprehensive metrics)
- "SIMPLE TEST - RAW METRICS" (debug dashboard)
- "GUARANTEED WORKING - TABLE VIEW" (table format)
- "GUARANTEED WORKING - STAT VIEW" (stat panels)
- Tested all metrics are being collected and displayed
- Created
monitoring.ymlplaybook for stack deployment - Created
verify-monitoring.ymlfor validation - Created
open-monitoring-ports.ymlfor firewall configuration - Tested idempotency of all playbooks
- Documented access URLs and credentials
Deliverables Completed:
- โ Centralized monitoring with Prometheus + Grafana
- โ 2 new Ansible roles (prometheus, grafana)
- โ 3 new playbooks for monitoring stack
- โ 4+ operational dashboards
- โ Real-time metrics from all 4 servers
- โ Documentation and access guide
Access URLs:
- Prometheus: http://192.168.56.11:9090
- Grafana: http://192.168.56.11:3001
- Grafana Credentials: admin / admin123!
Metrics Collected:
- CPU usage and load averages
- Memory utilization
- Disk space and I/O
- Network traffic
- System uptime
- Running processes
Time Invested: ~6 hours
Status: โ
100% Complete
Snapshot: Phase3-Complete-Monitoring-Stack (ready to create)
Objective: Implement centralized log management with rsyslog or ELK stack

Tasks Completed:
- Created
rsyslog_serverrole for control-node - Configured rsyslog to receive logs on port 514 (UDP/TCP)
- Set up log file organization by hostname
- Configured firewall to allow syslog traffic from internal network
- Created log directory structure in /var/log/remote/
- Created
rsyslog_clientrole for managed nodes - Configured all managed nodes to forward logs to control-node
- Set up reliable log forwarding with queue management
- Tested log forwarding from all 3 servers
- Implemented log rotation policies
- Configured retention: daily logs, weekly archives
- Set up automatic compression of old logs
- Created logrotate configuration for remote logs
- Created
logging.ymlplaybook for deployment - Created
verify-logging.ymlfor validation - Tested log forwarding from all managed nodes
- Verified centralized log collection
- Confirmed log rotation is working
Deliverables Completed:
- โ Centralized log server on control-node
- โ Log forwarding from all managed nodes
- โ 2 new Ansible roles (rsyslog_server, rsyslog_client)
- โ 2 new playbooks for logging infrastructure
- โ Automated log rotation and retention
- โ Organized log directory structure
Log Structure:
/var/log/remote/
โโโ web-server/
โ โโโ syslog
โโโ app-server/
โ โโโ syslog
โโโ db-server/
โโโ syslog
Time Invested: ~2 hours
Status: โ
100% Complete
Objective: Implement automated backup system for critical data

Tasks Completed:
- Designed multi-tier backup retention strategy
- Defined backup types: database and configuration
- Established retention periods:
- Daily: 7 days
- Weekly: 28 days
- Monthly: 90 days
- Created
backup_postgresqlrole - Implemented PostgreSQL backup script with pg_dump
- Configured compression (gzip) for space efficiency
- Set up automated retention management
- Created cron job for daily execution (2:00 AM)
- Deployed to db-server
- Created
backup_configsrole - Implemented backup script for system configurations:
- Nginx configurations
- Application code
- SSH configurations
- Firewall rules
- fail2ban settings
- rsyslog configurations
- Ansible infrastructure files
- Configured compression and retention
- Created cron job for daily execution (3:00 AM)
- Deployed to web-server and app-server
- Created
backup.ymlplaybook - Fixed YAML syntax errors in backup roles
- Deployed backup system to all servers
- Created backup directories on all nodes
- Verified backup scripts are executable
- Tested manual backup execution
- Confirmed cron jobs are scheduled
- Validated backup files are created with content
Deliverables Completed:
- โ Automated database backups (PostgreSQL on db-server)
- โ Automated configuration backups (web-server, app-server)
- โ 2 new Ansible roles (backup_postgresql, backup_configs)
- โ 1 new playbook for backup deployment
- โ Multi-tier retention policy (7/28/90 days)
- โ Scheduled cron jobs for automation
- โ Backup verification capability
Backup Architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Backup Strategy โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ db-server (10.0.2.14) โ
โ โโโ /var/backups/database/ โ
โ โ โโโ daily/ (7 days retention) โ
โ โ โโโ weekly/ (28 days retention) โ
โ โ โโโ monthly/ (90 days retention) โ
โ โโโ Cron: Daily at 2:00 AM โ
โ โ
โ web-server (10.0.2.12) โ
โ โโโ /var/backups/configs/ โ
โ โ โโโ daily/ (7 days retention) โ
โ โ โโโ weekly/ (28 days retention) โ
โ โ โโโ monthly/ (90 days retention) โ
โ โโโ Cron: Daily at 3:00 AM โ
โ โ
โ app-server (10.0.2.13) โ
โ โโโ /var/backups/configs/ โ
โ โ โโโ daily/ (7 days retention) โ
โ โ โโโ weekly/ (28 days retention) โ
โ โ โโโ monthly/ (90 days retention) โ
โ โโโ Cron: Daily at 3:00 AM โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Backup Components:
1. Database Backups (db-server):
- Full PostgreSQL database dump (appdb)
- Backup location:
/var/backups/database/ - Schedule: Daily at 2:00 AM
- Manual execution:
sudo /usr/local/bin/backup-database.sh
2. Configuration Backups (web-server, app-server):
- System and application configurations
- Backup location:
/var/backups/configs/ - Schedule: Daily at 3:00 AM
- Manual execution:
sudo /usr/local/bin/backup-configs.sh
Time Invested: ~2 hours
Status: โ
100% Complete
markdown# PHASE 6 UPDATES - COPY THESE SECTIONS INTO YOUR README
Phase 6: Disaster Recovery โ COMPLETE Objective: Develop and test disaster recovery procedures
Tasks Completed:
Step 6.1: Database Restore Tools โ Created restore-database.sh script for PostgreSQL Deployed to db-server (/usr/local/bin/) Interactive confirmation and validation Automatic backup decompression support Created list-db-backups.sh utility
Step 6.2: Configuration Restore Tools โ Created restore-configs.sh script for system configs Deployed to web-server and app-server Restores Nginx, SSH, application configurations Automatic service restart after restore Created list-config-backups.sh utility
Step 6.3: Infrastructure Rebuild Automation โ Created rebuild-infrastructure.yml master playbook Single command rebuilds entire infrastructure Imports all deployment playbooks in sequence Tested playbook syntax and structure
Step 6.4: Recovery Procedures Documented โ Defined Recovery Time Objectives (RTO):
- Database: 2 hours
- Web/App Servers: 30 minutes
- Complete Infrastructure: 4 hours Defined Recovery Point Objectives (RPO): 24 hours (daily backups) Created disaster recovery procedures Documented restore commands and workflows
Deliverables Completed:
โ Database restore script (restore-database.sh) โ Configuration restore script (restore-configs.sh) โ Backup listing utilities (list-db-backups.sh, list-config-backups.sh) โ 2 new playbooks (disaster-recovery.yml, rebuild-infrastructure.yml) โ RTO/RPO documentation โ Recovery procedures documented
Recovery Commands:
ssh sysadmin@192.168.56.14 "sudo /usr/local/bin/list-db-backups.sh" ssh sysadmin@192.168.56.12 "sudo /usr/local/bin/list-config-backups.sh"
ssh sysadmin@192.168.56.14 sudo /usr/local/bin/restore-database.sh /var/backups/database/daily/appdb_YYYY-MM-DD.sql.gz
ssh sysadmin@192.168.56.12 sudo /usr/local/bin/restore-configs.sh /var/backups/configs/daily/configs_YYYY-MM-DD.tar.gz
cd ~/infrastructure ansible-playbook playbooks/rebuild-infrastructure.yml
Time Invested: ~3 hours Status: โ 100% Complete
Phase 1: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
COMPLETE
Phase 2: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
COMPLETE
Phase 3: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
COMPLETE
Phase 4: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
COMPLETE
Phase 5: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
COMPLETE
Phase 5: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
COMPLETE
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Overall: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% Complete
Infrastructure Automated:
- โ 4 VMs fully configured via Ansible
- โ 8 reusable Ansible roles created
- โ 6 functional playbooks developed
- โ Complete 3-tier application stack deployed
- โ Zero manual configuration required
- โ Full idempotency verified
Services Deployed:
- โ Nginx reverse proxy (web-server)
- โ Express.js application (app-server)
- โ PostgreSQL 16 database (db-server)
- โ Security hardening (all servers)
- โ Monitoring agents (all servers)
Testing Results:
- โ All playbooks execute successfully
- โ Idempotency confirmed (safe to re-run)
- โ End-to-end connectivity verified
- โ All services healthy and responsive
- โ Security measures active and tested
Monitoring Infrastructure Deployed:
- โ Prometheus 3.9.1 installed and configured on control-node
- โ Grafana 12.3.2 installed and configured on control-node
- โ All 4 servers being monitored (100% coverage)
- โ Real-time metrics collection every 15 seconds
- โ Dashboard visualization with multiple views
- โ Data source integration tested and working
New Ansible Components:
- โ
roles/prometheus/- Complete Prometheus role - โ
roles/grafana/- Complete Grafana role - โ
playbooks/monitoring.yml- Monitoring stack deployment - โ
playbooks/verify-monitoring.yml- Monitoring validation - โ
playbooks/open-monitoring-ports.yml- Firewall configuration
Dashboards Created:
- โ "System Monitoring Dashboard" - Comprehensive metrics view
- โ "SIMPLE TEST - RAW METRICS" - Debug/verification dashboard
- โ "GUARANTEED WORKING - TABLE VIEW" - Tabular data display
- โ "GUARANTEED WORKING - STAT VIEW" - Stat panel dashboard
Testing Results:
- โ Prometheus scraping all 4 targets (all "UP")
- โ Grafana can query Prometheus successfully
- โ Dashboard panels showing real-time data
- โ All services healthy and responsive
- โ Firewall rules properly configured
- Phase 1: 4 hours โ
- Phase 2: 11 hours โ
- Phase 3: 6 hours โ
- Phase 4: 2 hours โ
- Phase 5: 6 hours โ
- Total so far: 25 hours
- Estimated remaining: 15-20 hours
Date: February73, 2026
Current Phase: Phase 4-5 - Complete โ
- โ Key-based authentication only (password authentication disabled)
- โ Root login disabled via SSH
- โ Public key authentication configured for sysadmin user
- โ MaxAuthTries: Limited to 3 attempts
- โ Automated via Ansible (ssh_hardening role)
Configuration File: /etc/ssh/sshd_config
- โ Default Policy: Deny incoming, Allow outgoing
- โ
Service-Specific Rules:
- SSH (22/tcp) - Management access
- HTTP (80/tcp) - Web server only
- HTTPS (443/tcp) - Web server only
- App (3000/tcp) - From web server only
- PostgreSQL (5432/tcp) - From app server only
- node_exporter (9100/tcp) - Internal network only
- Syslog (514/udp, 514/tcp) - Internal network only
- โ Automated via Ansible (firewall role)
Check Status: sudo ufw status verbose
- โ Monitoring: SSH login attempts
- โ Max Retries: 3 failed attempts
- โ Ban Time: 3600 seconds (1 hour)
- โ Find Time: 600 seconds (10 minutes)
- โ Automatic IP banning after threshold exceeded
- โ Automated via Ansible (fail2ban role)
Configuration File: /etc/fail2ban/jail.local
Check Status: sudo fail2ban-client status sshd
- โ Service: unattended-upgrades
- โ Update Type: Security updates only
- โ Auto-reboot: Disabled (manual control)
- โ Old Kernel Cleanup: Enabled
- โ Daily Update Check: Automated
- โ Automated via Ansible (auto_updates role)
Configuration File: /etc/apt/apt.conf.d/50unattended-upgrades
- โ Agent: Prometheus node_exporter v1.8.2
- โ Metrics Port: 9100
- โ
Metrics Collected:
- CPU usage and load averages
- Memory and swap utilization
- Disk space and I/O
- Network traffic and errors
- System uptime and processes
- โ Automated via Ansible (node_exporter role)
Access Metrics: curl http://localhost:9100/metrics
- โ Web Tier: Internet-facing (ports 80, 443)
- โ App Tier: Only accessible from web server
- โ Database Tier: Only accessible from app server
- โ Management: SSH restricted via firewall rules
- Host OS: Windows 11 Pro
- Hypervisor: Oracle VirtualBox 7.x
- Guest OS: Linux Mint 22 Wilma (based on Ubuntu 24.04 LTS)
- Kernel: Linux 6.8.x
- Ansible: 2.16+ (automation platform)
- YAML: Configuration and playbook syntax
- Jinja2: Template engine for dynamic configurations
- Git: Version control
- GitHub: Remote repository
- OpenSSH: Secure remote access
- UFW: Firewall management (frontend for iptables)
- fail2ban: Intrusion prevention system
- unattended-upgrades: Automatic security patching
- Nginx: Reverse proxy and web server
- Node.js: JavaScript runtime (v18.x)
- Express.js: Web application framework
- PostgreSQL: Relational database (v16)
- Prometheus node_exporter: Metrics collection agent (v1.8.2)
- Prometheus: Time-series database and alerting (v3.9.1)
- Grafana: Visualization and dashboards (v12.3.2)
- rsyslog: Centralized log management
- logrotate: Log rotation and retention
- pg_dump: PostgreSQL backup utility
- cron: Job scheduling for automated backups
- Bash: Backup and maintenance scripts
- Hardware: 16GB RAM minimum, 4+ CPU cores recommended
- Software:
- VirtualBox 7.x or later
- Windows 10/11 (or any OS supporting VirtualBox)
- SSH client (built into Windows 10+)
- Git (for version control)
# Access Ansible control node
ssh sysadmin@192.168.56.11
# Access web server
ssh sysadmin@192.168.56.12
# Access app server
ssh sysadmin@192.168.56.13
# Access database server
ssh sysadmin@192.168.56.14# SSH into control node
ssh sysadmin@192.168.56.11
# Navigate to infrastructure directory
cd ~/infrastructure
# Test connectivity to all managed nodes
ansible managed_nodes -m ping
# Apply complete infrastructure automation
ansible-playbook playbooks/base-hardening.yml
# Deploy web server (Nginx)
ansible-playbook playbooks/web-server.yml
# Deploy application server (Node.js)
ansible-playbook playbooks/app-server.yml
# Deploy database server (PostgreSQL)
ansible-playbook playbooks/db-server.yml
# Deploy monitoring stack (Prometheus + Grafana)
ansible-playbook playbooks/monitoring.yml
# Deploy centralized logging
ansible-playbook playbooks/logging.yml
# Deploy backup system
ansible-playbook playbooks/backup.yml
# Verify all configurations and services
ansible-playbook playbooks/verify-config.yml
ansible-playbook playbooks/verify-all-services.yml
ansible-playbook playbooks/verify-monitoring.yml
ansible-playbook playbooks/verify-logging.yml
# Run in check mode (dry run - no changes)
ansible-playbook playbooks/base-hardening.yml --check
# Run with verbose output for troubleshooting
ansible-playbook playbooks/base-hardening.yml -vvv# From Windows browser:
# Prometheus: http://192.168.56.11:9090
# Grafana: http://192.168.56.11:3001
#
# Grafana Credentials:
# Username: admin
# Password: admin123!# From control-node, view centralized logs
# Web server logs
sudo tail -f /var/log/remote/web-server/syslog
# App server logs
sudo tail -f /var/log/remote/app-server/syslog
# Database server logs
sudo tail -f /var/log/remote/db-server/syslog
# View all logs
sudo ls -lh /var/log/remote/*/# Check backup directories exist
ansible all -m shell -a "ls -lh /var/backups/" -b
# Check database backup files
ansible db_servers -m shell -a "ls -lh /var/backups/database/daily/" -b
# Check configuration backup files
ansible managed_nodes -m shell -a "ls -lh /var/backups/configs/daily/" -b
# Verify cron jobs are scheduled
ansible all -m shell -a "crontab -l 2>/dev/null | grep backup || echo 'No backup cron jobs'" -b
# Manually trigger a test backup
# Database backup
ansible db_servers -m shell -a "/usr/local/bin/backup-database.sh" -b
# Configuration backup
ansible web_servers -m shell -a "/usr/local/bin/backup-configs.sh" -b# From Windows, test the web server
curl http://192.168.56.12
# From control-node, test app server health
ansible app_servers -m shell -a "curl -s http://localhost:3000/health"
# Test database connectivity
ansible app_servers -m shell -a 'PGPASSWORD=SecurePassword123\! psql -h 10.0.2.14 -U appuser -d appdb -c "SELECT version();"' -b
# Test end-to-end flow (Web โ App โ Database)
curl http://192.168.56.12/healthinfrastructure/
โ
โโโ ansible.cfg # Ansible configuration file
โโโ .gitignore # Git ignore rules (secrets, keys)
โโโ README.md # This comprehensive documentation
โ
โโโ inventory/
โ โโโ hosts.yml # Server inventory with host groups
โ
โโโ group_vars/
โ โโโ all.yml # Global variables for all hosts
โ
โโโ playbooks/ # Ansible playbooks (automation scripts)
โ โโโ base-hardening.yml # Security hardening for all servers
โ โโโ web-server.yml # Nginx reverse proxy deployment
โ โโโ app-server.yml # Node.js application deployment
โ โโโ db-server.yml # PostgreSQL database deployment
โ โโโ verify-config.yml # Individual service verification
โ โโโ verify-all-services.yml # End-to-end testing
โ โโโ monitoring.yml # Monitoring stack deployment
โ โโโ verify-monitoring.yml # Monitoring validation
โ โโโ open-monitoring-ports.yml # Firewall for monitoring
โ โโโ logging.yml # ๐ Centralized logging deployment
โ โโโ verify-logging.yml # ๐ Logging validation
โ โโโ backup.yml # ๐พ Backup system deployment
โ
โโโ roles/ # Ansible roles (reusable components)
โ โโโ ssh_hardening/ # SSH security configuration
โ โโโ firewall/ # UFW firewall configuration
โ โโโ fail2ban/ # Intrusion prevention
โ โโโ auto_updates/ # Automatic security updates
โ โโโ node_exporter/ # Monitoring agent
โ โโโ nginx/ # Web server and reverse proxy
โ โโโ nodejs_app/ # Node.js application
โ โโโ postgresql/ # PostgreSQL database
โ โโโ prometheus/ # Prometheus monitoring
โ โโโ grafana/ # Grafana visualization
โ โโโ rsyslog_server/ # ๐ Centralized log server
โ โโโ rsyslog_client/ # ๐ Log forwarding client
โ โโโ backup_postgresql/ # ๐พ Database backup automation
โ โโโ backup_configs/ # ๐พ Configuration backup automation
โ
โโโ files/ # Static files (future use)
โโโ scripts/
Click to expand Phase 1 commands
# Set static IP on NAT Network (enp0s3)
sudo nmcli connection modify "Wired connection 1" \
ipv4.addresses 10.0.2.10/24 \
ipv4.gateway 10.0.2.1 \
ipv4.dns "8.8.8.8,8.8.4.4" \
ipv4.method manual
# Apply changes
sudo nmcli connection down "Wired connection 1"
sudo nmcli connection up "Wired connection 1"
# Set static IP on Host-Only Network (enp0s8)
sudo nmcli connection add type ethernet ifname enp0s8 con-name host-only \
ipv4.addresses 192.168.56.10/24 \
ipv4.method manual
# Activate Host-Only connection
sudo nmcli connection up host-only
# Verify network configuration
ip addr show
ip route show# Set hostname
sudo hostnamectl set-hostname baseline-template
# Edit /etc/hosts for proper resolution
sudo nano /etc/hosts
# Change 127.0.1.1 line to: 127.0.1.1 baseline-template
# Verify
hostname
hostnamectl# Update package lists
sudo apt update
# Upgrade all packages
sudo apt upgrade -y
# Install essential tools
sudo apt install -y vim curl wget git net-tools ufw fail2ban openssh-server
# Reboot to apply updates
sudo reboot# Install OpenSSH server (usually pre-installed)
sudo apt install -y openssh-server
# Enable and start SSH service
sudo systemctl enable ssh
sudo systemctl start ssh
# Create SSH directory and set permissions
mkdir -p ~/.ssh
chmod 700 ~/.ssh
# Add your public key to authorized_keys (paste your key)
nano ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
# Backup original SSH configuration
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup
# Edit SSH configuration
sudo nano /etc/ssh/sshd_config
# Set these values:
# PermitRootLogin no
# PasswordAuthentication no
# PubkeyAuthentication yes
# AuthorizedKeysFile .ssh/authorized_keys
# MaxAuthTries 3
# Test configuration syntax
sudo sshd -t
# Restart SSH service
sudo systemctl restart ssh
# Verify SSH status
sudo systemctl status ssh# Set default policies
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow 22/tcp
# Enable firewall
sudo ufw enable
# Check status
sudo ufw status verbose
sudo ufw status numbered# Install fail2ban
sudo apt install -y fail2ban
# Copy default configuration
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
# Edit configuration
sudo nano /etc/fail2ban/jail.local
# Configure [sshd] section:
# enabled = true
# port = 22
# maxretry = 3
# bantime = 3600
# findtime = 600
# Enable and start fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban
# Check status
sudo systemctl status fail2ban
sudo fail2ban-client status
sudo fail2ban-client status sshd# Install unattended-upgrades
sudo apt install -y unattended-upgrades apt-listchanges
# Configure unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades
# Edit configuration file
sudo nano /etc/apt/apt.conf.d/50unattended-upgrades
# Ensure security updates are enabled:
# "${distro_id}:${distro_codename}-security";
# Enable and start service
sudo systemctl enable unattended-upgrades
sudo systemctl start unattended-upgrades
# Check status
sudo systemctl status unattended-upgrades# Download node_exporter
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
# Extract
tar xvfz node_exporter-1.8.2.linux-amd64.tar.gz
# Move binary to system path
sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
# Create system user
sudo useradd --no-create-home --shell /bin/false node_exporter
# Set ownership
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
# Create systemd service
sudo nano /etc/systemd/system/node_exporter.service
# Paste service configuration
# Reload systemd, enable and start service
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
# Verify status
sudo systemctl status node_exporter
# Allow node_exporter through firewall (internal network only)
sudo ufw allow from 10.0.2.0/24 to any port 9100 proto tcp
# Test metrics endpoint
curl http://localhost:9100/metrics | head -20Click to expand Phase 2 commands
# SSH into control-node from Windows
ssh sysadmin@192.168.56.11
# Update system
sudo apt update && sudo apt upgrade -y
# Install Ansible
sudo apt install -y ansible
# Verify installation
ansible --version# Create main project directory
mkdir -p ~/infrastructure
cd ~/infrastructure
# Create subdirectories
mkdir -p inventory playbooks roles group_vars host_vars files templates
# Create initial files
touch ansible.cfg
touch inventory/hosts.yml
touch group_vars/all.yml# Generate SSH key pair for Ansible (on control-node)
ssh-keygen -t ed25519 -C "ansible-control"
# Press ENTER for all prompts
# View the public key
cat ~/.ssh/id_ed25519.pub
# Copy SSH key to each managed node
ssh-copy-id sysadmin@web-server
ssh-copy-id sysadmin@app-server
ssh-copy-id sysadmin@db-server
# Test passwordless SSH
ssh sysadmin@web-server "hostname"
ssh sysadmin@app-server "hostname"
ssh sysadmin@db-server "hostname"# Ping all managed nodes
ansible managed_nodes -m ping
# Check hostname
ansible managed_nodes -m command -a "hostname"
# Check uptime
ansible managed_nodes -m command -a "uptime"
# Test sudo access
ansible managed_nodes -m command -a "sudo whoami"# Deploy web server (Nginx)
ansible-playbook playbooks/web-server.yml
# Deploy application server (Node.js)
ansible-playbook playbooks/app-server.yml
# Deploy database server (PostgreSQL)
ansible-playbook playbooks/db-server.yml
# Verify all services
ansible-playbook playbooks/verify-all-services.ymlClick to expand Phase 3 commands
# Deploy Prometheus and Grafana
ansible-playbook playbooks/monitoring.yml
# Verify the monitoring stack
ansible-playbook playbooks/verify-monitoring.yml# Check Prometheus service
sudo systemctl status prometheus
# Check Prometheus targets
curl http://localhost:9090/api/v1/targets | python3 -m json.tool
# Test Prometheus queries
curl "http://localhost:9090/api/v1/query?query=up" | python3 -m json.tool# Check Grafana service
sudo systemctl status grafana-server
# Test Grafana API
curl http://localhost:3001/api/health | python3 -m json.toolClick to expand Phase 4 commands
# SSH into control-node
ssh sysadmin@192.168.56.11
cd ~/infrastructure
# Deploy centralized logging
ansible-playbook playbooks/logging.yml
# Verify logging setup
ansible-playbook playbooks/verify-logging.yml# Check rsyslog service
sudo systemctl status rsyslog
# View rsyslog configuration
sudo cat /etc/rsyslog.d/50-remote.conf
# Check if rsyslog is listening on port 514
sudo netstat -tulpn | grep rsyslog
sudo ss -tulpn | grep 514
# View centralized logs
sudo ls -lh /var/log/remote/
# View logs from specific server
sudo tail -f /var/log/remote/web-server/syslog
sudo tail -f /var/log/remote/app-server/syslog
sudo tail -f /var/log/remote/db-server/syslog# Check rsyslog service on all nodes
ansible managed_nodes -m systemd -a "name=rsyslog state=started enabled=yes" -b
# View rsyslog client configuration
ansible managed_nodes -m shell -a "cat /etc/rsyslog.d/50-forward.conf" -b
# Test log forwarding
ansible web_servers -m shell -a "logger 'Test log message from web-server'" -b
# Verify test message appeared on control-node
sudo grep "Test log message" /var/log/remote/web-server/syslog# Check firewall allows syslog
ansible control -m shell -a "sudo ufw status | grep 514" -b
# Restart rsyslog on all nodes
ansible all -m systemd -a "name=rsyslog state=restarted" -b
# Check rsyslog errors
ansible all -m shell -a "sudo journalctl -u rsyslog -n 20" -b
# Test connectivity from client to server
ansible managed_nodes -m shell -a "nc -zv 10.0.2.11 514" -bClick to expand Phase 5 commands
# SSH into control-node
ssh sysadmin@192.168.56.11
cd ~/infrastructure
# Deploy backup automation
ansible-playbook playbooks/backup.yml# 1. Verify backup directories exist
ansible all -m shell -a "ls -lh /var/backups/" -b
# 2. Check actual backup files were created
ansible db_servers -m shell -a "ls -lh /var/backups/database/daily/" -b
ansible managed_nodes -m shell -a "ls -lh /var/backups/configs/daily/" -b
# 3. Verify cron jobs are scheduled
ansible all -m shell -a "crontab -l 2>/dev/null | grep backup || echo 'No backup cron jobs'" -b
# 4. Check backup file sizes to confirm they have content
ansible db_servers -m shell -a "du -sh /var/backups/database/daily/* 2>/dev/null | head -3" -b# Manually trigger database backup
ansible db_servers -m shell -a "/usr/local/bin/backup-database.sh" -b
# Manually trigger configuration backup
ansible web_servers -m shell -a "/usr/local/bin/backup-configs.sh" -b
ansible app_servers -m shell -a "/usr/local/bin/backup-configs.sh" -b
# View backup logs
ansible db_servers -m shell -a "tail -20 /var/backups/logs/backup.log" -b
ansible web_servers -m shell -a "tail -20 /var/backups/logs/backup.log" -b# Database backups on db-server
ssh sysadmin@192.168.56.14
sudo ls -lh /var/backups/database/daily/
sudo ls -lh /var/backups/database/weekly/
sudo ls -lh /var/backups/database/monthly/
# Configuration backups on web-server
ssh sysadmin@192.168.56.12
sudo ls -lh /var/backups/configs/daily/
sudo ls -lh /var/backups/configs/weekly/
sudo ls -lh /var/backups/configs/monthly/
# Check cron schedule
crontab -l | grep backup# To restore database backup (example for Phase 6):
# sudo -u postgres pg_restore -d appdb /var/backups/database/daily/backup-YYYYMMDD.sql.gz
# To restore configuration files (example for Phase 6):
# sudo tar -xzf /var/backups/configs/daily/backup-YYYYMMDD.tar.gz -C /Click to expand troubleshooting commands
# Check IP addresses
ip addr show
# Check routing table
ip route show
# Test connectivity
ping -c 4 8.8.8.8
ping -c 4 google.com
ping -c 4 web-server
# Check open ports
sudo netstat -tulpn
sudo ss -tulpn
# Test specific port
nc -zv hostname port# Check SSH service status
sudo systemctl status ssh
# View SSH logs
sudo journalctl -u ssh -n 50
sudo tail -f /var/log/auth.log
# Test SSH configuration
sudo sshd -t
# Debug SSH connection
ssh -vvv sysadmin@hostname# Check UFW status
sudo ufw status verbose
sudo ufw status numbered
# View UFW logs
sudo tail -f /var/log/ufw.log# Check service status
sudo systemctl status SERVICE_NAME
# Start/Stop/Restart service
sudo systemctl start SERVICE_NAME
sudo systemctl stop SERVICE_NAME
sudo systemctl restart SERVICE_NAME
# Enable/Disable on boot
sudo systemctl enable SERVICE_NAME
sudo systemctl disable SERVICE_NAME
# View service logs
sudo journalctl -u SERVICE_NAME -n 50
sudo journalctl -u SERVICE_NAME -fThis project showcases a comprehensive set of skills valued in DevOps, Cloud Engineering, and System Administration roles:
Linux System Administration:
- Server installation and configuration
- Network configuration and troubleshooting
- User and permission management
- Service management with systemd
- Package management (apt)
- Log analysis and troubleshooting
Security & Hardening:
- SSH key-based authentication
- Firewall configuration (UFW/iptables)
- Intrusion detection and prevention
- Security patch management
- Principle of least privilege
- Network segmentation
- Security auditing
Infrastructure as Code (IaC):
- Ansible playbook development
- Role-based architecture
- Idempotent configuration
- Template management (Jinja2)
- Variable management
- Inventory organization
- Multi-tier application deployment
Automation & Scripting:
- Bash scripting
- Ansible automation
- Configuration management
- Automated deployment
- Service orchestration
- Cron job scheduling
Application Deployment:
- Reverse proxy configuration (Nginx)
- Application server setup (Node.js/Express)
- Database deployment (PostgreSQL)
- Service integration
- Health check implementation
Monitoring & Observability:
- Metrics collection (node_exporter)
- Service monitoring (Prometheus)
- Performance tracking and visualization (Grafana)
- Dashboard creation and customization
- Time-series data analysis
- Infrastructure observability
- Real-time monitoring implementation
Logging & Auditing:
- Centralized log management (rsyslog)
- Log forwarding configuration
- Log rotation and retention policies
- Log analysis and troubleshooting
Backup & Recovery:
- Automated backup strategies
- Database backup automation (pg_dump)
- Configuration backup procedures
- Multi-tier retention policies
- Backup verification and testing
- Disaster recovery planning
Version Control:
- Git workflow
- Repository management
- Commit best practices
- Documentation maintenance
- Change tracking
Networking:
- TCP/IP fundamentals
- DNS configuration
- Firewall rules
- Port management
- Network troubleshooting
- Multi-network setup (NAT + Host-Only)
Documentation:
- Clear, comprehensive README
- Command references
- Architecture diagrams
- Troubleshooting guides
- Progressive documentation
Problem Solving:
- Systematic debugging
- Root cause analysis
- Solution documentation
- Iterative improvement
Project Management:
- Phase-based approach
- Progress tracking
- Time estimation
- Milestone achievement
- Scope management
Professional Development:
- Self-directed learning
- Following best practices
- Continuous improvement
- Knowledge sharing
Potential additions to expand the project:
- Database restore procedures
- Configuration restore testing
- Full infrastructure rebuild automation
- RTO/RPO documentation
- Disaster recovery playbooks
- High Availability (HA) setup with keepalived
- Load balancing with HAProxy or Nginx
- Containerization with Docker
- Container orchestration with Kubernetes (K3s)
- Service mesh implementation (Istio/Linkerd)
- VPN setup (OpenVPN/WireGuard)
- Certificate management with Let's Encrypt
- Web Application Firewall (ModSecurity)
- Security scanning (Lynis, OpenVAS)
- Compliance automation (CIS benchmarks)
- Vulnerability scanning
- APM (Application Performance Monitoring)
- Distributed tracing (Jaeger)
- Custom metrics and exporters
- PagerDuty/Slack integration
- SLA monitoring
- Log aggregation (ELK stack)
- Jenkins/GitLab CI setup
- Automated testing
- Blue-green deployments
- Canary releases
- Rollback procedures
- Pipeline automation
- Terraform for AWS/Azure/GCP
- Cloud-native monitoring
- Auto-scaling groups
- Managed database services
- Cloud cost optimization
This is a personal portfolio project, but feedback and suggestions are always welcome!
- Issues: Open an issue on GitHub for bugs or suggestions
- Discussions: Start a discussion for questions or ideas
- Pull Requests: Not accepting PRs as this is a learning project, but appreciate the interest!
If you're building a similar project, here are helpful resources:
Ansible:
- Ansible Documentation
- Ansible Galaxy - Pre-built roles
- Jeff Geerling's Ansible for DevOps
Linux:
- Linux Journey
- The Linux Command Line (free book)
- OverTheWire Bandit - Security practice
DevOps:
Prometheus & Grafana:
Logging & Backup:
This project is open source and available under the MIT License.
Feel free to use this project as a template or reference for your own learning!
Skander Ba
- ๐ GitHub: @Skanderba8
- ๐ง Email: baskander5@gmail.com
- ๐ผ LinkedIn: Skander Ben Abdallah
- ๐ Portfolio: Website
- GitHub Issues: For bugs or technical questions
- GitHub Discussions: For general questions and ideas
- Email: For collaboration or opportunities
Special thanks to:
- The Ansible community for excellent documentation
- The Linux community for amazing tools and support
- The Prometheus and Grafana teams for outstanding monitoring tools
- The rsyslog and PostgreSQL communities for robust logging and database solutions
- Everyone who provides feedback and suggestions
- Started: January 30, 2026
- Phase 1 Complete: January 31, 2026
- Phase 2 Complete: February 3, 2026
- Phase 3 Complete: February 3, 2026
- Phase 4 Complete: February 7, 2026
- Phase 5 Complete: February 9, 2026
- Phase 6 Complete: February 12, 2026
- Current Status: Phase 6 Complete - Project completed
- Total Commits: Check GitHub for latest count
- Lines of Ansible Code: ~2,000+
- Documentation Pages: 1 (comprehensive README)
- Services Deployed: 3-tier application stack + Monitoring + Logging + Backups
- Ansible Roles: 14 (security + services + monitoring + logging + backup)
- Playbooks: 11 (deployment + verification)
- โ 2026-01-30: Project initiated, Phase 1 planning complete
- โ 2026-01-31: Phase 1 complete - All VMs configured manually
- โ 2026-02-02: Ansible installed, SSH keys distributed, passwordless sudo configured
- โ 2026-02-02: Git repository initialized and pushed to GitHub
- โ 2026-02-02: Created all base security hardening roles
- โ 2026-02-02: Base-hardening playbook tested and verified
- โ 2026-02-03: Web server (Nginx) deployed successfully
- โ 2026-02-03: Application server (Node.js) deployed successfully
- โ 2026-02-03: Database server (PostgreSQL) deployed successfully
- โ 2026-02-03: Phase 2 COMPLETE - Full automation verified
- โ 2026-02-03: Prometheus deployed and configured
- โ 2026-02-03: Grafana deployed and configured
- โ 2026-02-03: Monitoring dashboards created and tested
- โ 2026-02-03: Phase 3 COMPLETE - Centralized monitoring operational
- โ 2026-02-07: rsyslog server configured on control-node
- โ 2026-02-07: Log forwarding from all managed nodes working
- โ 2026-02-07: Phase 4 COMPLETE - Centralized logging operational
- โ 2026-02-07: Database backup automation deployed
- โ 2026-02-07: Configuration backup automation deployed
- โ 2026-02-07: Multi-tier retention policy implemented
- โ 2026-02-07: Phase 5 COMPLETE - Automated backups operational
- โ 2026-02-12: Phase 6 COMPLETE - Completed Project
Last Verified: February 7, 2026
| Component | Status | Health Check |
|---|---|---|
| Web Server (Nginx) | ๐ข Running | โ HTTP responding |
| App Server (Node.js) | ๐ข Running | โ Health endpoint OK |
| Database (PostgreSQL) | ๐ข Running | โ Accepting connections |
| SSH Security | ๐ข Active | โ Key-only auth |
| Firewall (UFW) | ๐ข Active | โ Rules enforced |
| fail2ban | ๐ข Active | โ Monitoring SSH |
| Auto Updates | ๐ข Configured | โ Security patches enabled |
| Node Exporter | ๐ข Running | โ Metrics available (all 4 servers) |
| Prometheus | ๐ข Running | โ All 4 targets UP โ |
| Grafana | ๐ข Running | โ Dashboards operational โ |
| Centralized Logging | ๐ข Running | โ All nodes forwarding logs โ |
| Database Backups | ๐ข Scheduled | โ Cron job active (2:00 AM) โ |
| Config Backups | ๐ข Scheduled | โ Cron jobs active (3:00 AM) โ |
| End-to-End Connectivity | ๐ข Verified | โ WebโAppโDB working |
| Full Stack | ๐ข Verified | โ Complete observability & backup โ |
Last Updated: February 12, 2026
README Version: 5.0
Status: Living Document - Updated as project progresses
If you found this project helpful or interesting, please consider giving it a โญ on GitHub!