A production-style Linux infrastructure project demonstrating system administration, security hardening, and Infrastructure as Code (IaC) practices using Ansible automation.

📋 Table of Contents

Project Overview
Infrastructure Architecture
Project Phases
Current Progress
Security Implementations
Technologies Used
Quick Start
Command Reference
Skills Demonstrated
Future Enhancements
Contributing
License
Author

🎯 Project Overview

This project showcases the complete lifecycle of building, securing, and automating a multi-server Linux environment from scratch. It demonstrates real-world DevOps and system administration practices used in production environments.

What This Project Demonstrates

Infrastructure as Code (IaC): Everything is reproducible and version-controlled
Security First: Multi-layered security approach with automated hardening
Automation: Manual tasks converted to reusable Ansible playbooks
Service Deployment: Full 3-tier web application stack (Nginx → Node.js → PostgreSQL)
Monitoring Ready: Metrics collection infrastructure with node_exporter
Professional Documentation: Clear, comprehensive, and maintainable

Project Goals

✅ Build a multi-server Linux environment with proper networking
✅ Implement security best practices (SSH hardening, firewalls, intrusion prevention)
✅ Automate everything with Ansible for repeatability
✅ Deploy production-ready services (web, application, database tiers)
✅ Implement centralized monitoring and alerting
✅ Create automated backup and disaster recovery procedures
✅ Test failure scenarios and validate recovery processes
✅ Document everything for knowledge transfer

Time Investment: ~30-40 hours (6 phases)
Current Time Spent: ~30 hours

🏗️ Infrastructure Architecture

Environment Specifications

Hypervisor: VirtualBox 7.x on Windows 11 host
Operating System: Linux Mint 22 (based on Ubuntu 24.04 LTS)
Network: Dual-adapter setup (NAT + Host-Only)
Automation Platform: Ansible 2.16+
Version Control: Git / GitHub

Server Topology

Hostname	Role	NAT IP	Host-Only IP	vCPU	RAM	Disk	Status
baseline-template	Golden Image	10.0.2.10	192.168.56.10	2	2GB	25GB	🔴 Powered Off
control-node	Ansible Controller	10.0.2.11	192.168.56.11	2	2GB	25GB	🟢 Running
web-server	Nginx Reverse Proxy	10.0.2.12	192.168.56.12	2	2GB	25GB	🟢 Running
app-server	Node.js Application	10.0.2.13	192.168.56.13	2	2GB	25GB	🟢 Running
db-server	PostgreSQL Database	10.0.2.14	192.168.56.14	2	4GB	50GB	🟢 Running

Network Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Windows 11 Host Machine                       │
│                  Your Workstation (SSH Client)                    │
│                   192.168.56.1 (Host-Only Gateway)               │
└───────────────────────────┬─────────────────────────────────────┘
                            │
                            │ SSH Access via Host-Only Network
                            │ (Management & Development)
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
        │                   │                   │
   ┌────▼─────┐      ┌──────▼──────┐     ┌─────▼──────┐
   │ control  │      │ web-server  │     │ app-server │
   │  -node   │◄────►│   (nginx)   │◄───►│  (node.js) │
   │          │      │             │     │            │
   │.56.11    │      │  .56.12     │     │  .56.13    │
   └────┬─────┘      └──────┬──────┘     └─────┬──────┘
        │                   │                   │
        │                   │                   │
        └───────────────────┼───────────────────┘
                            │
                            │
                      ┌─────▼──────┐
                      │ db-server  │
                      │(postgresql)│
                      │            │
                      │  .56.14    │
                      └────────────┘

╔══════════════════════════════════════════════════════════════════╗
║       NAT Network (InfraNet - 10.0.2.0/24)                      ║
║   VM-to-VM Communication & Internet Access                       ║
║                                                                  ║
║   control-node: 10.0.2.11    web-server:  10.0.2.12            ║
║   app-server:   10.0.2.13    db-server:   10.0.2.14            ║
╚══════════════════════════════════════════════════════════════════╝
                            │
                            │ Internet Access
                            ▼
                    ┌───────────────┐
                    │   Internet    │
                    │ (via Windows) │
                    └───────────────┘

Application Flow

Internet → [Web Server:80] → [App Server:3000] → [Database:5432]
            Nginx Proxy       Express.js API      PostgreSQL
            
Security: Each tier only accepts connections from the previous tier

📈 Project Phases

Phase 1: Manual Base Configuration ✅ COMPLETE

Objective: Build the infrastructure foundation manually to understand every component

Tasks Completed:

VirtualBox environment setup with NAT and Host-Only networks
Created baseline template VM with dual network adapters
Manual security hardening (SSH, firewall, fail2ban)
Installed monitoring agent (node_exporter)
Configured automatic security updates
Cloned and configured 4 production VMs
Established hostname resolution via /etc/hosts
Verified connectivity and services
Created snapshot: Phase1-Complete-Baseline

Time Invested: ~4 hours
Status: ✅ 100% Complete

Phase 2: Automation with Ansible ✅ COMPLETE

Objective: Convert all manual configurations into automated, repeatable Ansible playbooks

Tasks Completed:

Step 2.1: Ansible Control Node Setup ✅

Step 2.2: Base Security Hardening Automation ✅

Created 5 security roles:
- ssh_hardening - SSH security configuration
- firewall - UFW firewall rules
- fail2ban - Intrusion prevention system
- auto_updates - Unattended security patches
- node_exporter - Prometheus metrics exporter
Created base-hardening.yml playbook
Successfully executed on all 3 managed nodes
Tested and verified idempotency
Created verify-config.yml verification playbook
All security services running and verified

Step 2.3: Service-Specific Playbooks ✅

Web Server Deployment:
- Created nginx role with reverse proxy configuration
- Configured security headers
- Created web-server.yml playbook
- Deployed and verified Nginx
- Opened firewall ports 80, 443
Application Server Deployment:
- Created nodejs_app role
- Deployed Express.js sample application
- Configured systemd service (myapp.service)
- Created app-server.yml playbook
- Service running and health checks passing
- Restricted access to web server only
Database Server Deployment:
- Created postgresql role
- Installed PostgreSQL 16
- Created application database (appdb)
- Created database user (appuser)
- Configured network access from app server
- Created db-server.yml playbook
- Database accessible and verified

Step 2.4: End-to-End Testing ✅

Created comprehensive verification playbook
Tested Web → App connectivity
Tested App → Database connectivity
Verified full request flow (end-to-end)
All services passing health checks

Deliverables Completed:

8 Ansible roles (reusable components)
6 Ansible playbooks (automation scripts)
Complete 3-tier application stack
Full security hardening
Monitoring foundation

Time Invested: ~11 hours
Status: ✅ 100% Complete
Snapshot: Phase2-Complete-Full-Automation (ready to create)

Phase 3: Centralized Monitoring ✅ COMPLETE

Objective: Implement Prometheus and Grafana for infrastructure monitoring Tasks Completed:

Step 3.1: Prometheus Deployment ✅

Created Prometheus role (roles/prometheus/)
Installed Prometheus 3.9.1 from GitHub releases
Created Prometheus system user and directories
Configured Prometheus to scrape all 4 node_exporters
Set up systemd service for Prometheus
Configured scraping targets:
- control-node: 10.0.2.11:9100
- web-server: 10.0.2.12:9100
- app-server: 10.0.2.13:9100
- db-server: 10.0.2.14:9100
Configured firewall to allow port 9090 from host-only network
Verified Prometheus is running and healthy

Step 3.2: Grafana Deployment ✅

Created Grafana role (roles/grafana/)
Installed Grafana 12.3.2 from official repository
Configured Grafana to run on port 3001
Set default admin password: admin123!
Configured Prometheus as default data source
Configured firewall to allow port 3001 from host-only network
Created provisioning for automatic data source configuration
Verified Grafana is running and accessible

Step 3.3: Dashboard Implementation ✅

Created monitoring dashboards via Grafana API
Imported and tested dashboard templates
Created working dashboards with proven queries:
- "System Monitoring Dashboard" (comprehensive metrics)
- "SIMPLE TEST - RAW METRICS" (debug dashboard)
- "GUARANTEED WORKING - TABLE VIEW" (table format)
- "GUARANTEED WORKING - STAT VIEW" (stat panels)
Tested all metrics are being collected and displayed

Step 3.4: Playbook Development ✅

Created monitoring.yml playbook for stack deployment
Created verify-monitoring.yml for validation
Created open-monitoring-ports.yml for firewall configuration
Tested idempotency of all playbooks
Documented access URLs and credentials

Deliverables Completed:

✅ Centralized monitoring with Prometheus + Grafana
✅ 2 new Ansible roles (prometheus, grafana)
✅ 3 new playbooks for monitoring stack
✅ 4+ operational dashboards
✅ Real-time metrics from all 4 servers
✅ Documentation and access guide

Access URLs:

Prometheus: http://192.168.56.11:9090
Grafana: http://192.168.56.11:3001
Grafana Credentials: admin / admin123!

Metrics Collected:

CPU usage and load averages
Memory utilization
Disk space and I/O
Network traffic
System uptime
Running processes

Time Invested: ~6 hours
Status: ✅ 100% Complete
Snapshot: Phase3-Complete-Monitoring-Stack (ready to create)

Phase 4: Centralized Logging ⏸️ PENDING

Objective: Implement centralized log management with rsyslog or ELK stack

Tasks Completed:

Step 4.1: Log Server Setup ✅

Created rsyslog_server role for control-node
Configured rsyslog to receive logs on port 514 (UDP/TCP)
Set up log file organization by hostname
Configured firewall to allow syslog traffic from internal network
Created log directory structure in /var/log/remote/

Step 4.2: Log Client Configuration ✅

Created rsyslog_client role for managed nodes
Configured all managed nodes to forward logs to control-node
Set up reliable log forwarding with queue management
Tested log forwarding from all 3 servers

Step 4.3: Log Management ✅

Implemented log rotation policies
Configured retention: daily logs, weekly archives
Set up automatic compression of old logs
Created logrotate configuration for remote logs

Step 4.4: Testing and Verification ✅

Created logging.yml playbook for deployment
Created verify-logging.yml for validation
Tested log forwarding from all managed nodes
Verified centralized log collection
Confirmed log rotation is working

Deliverables Completed:

✅ Centralized log server on control-node
✅ Log forwarding from all managed nodes
✅ 2 new Ansible roles (rsyslog_server, rsyslog_client)
✅ 2 new playbooks for logging infrastructure
✅ Automated log rotation and retention
✅ Organized log directory structure

Log Structure:

/var/log/remote/
├── web-server/
│   └── syslog
├── app-server/
│   └── syslog
└── db-server/
    └── syslog

Time Invested: ~2 hours
Status: ✅ 100% Complete

Phase 5: Backup Automation ✅ COMPLETE

Objective: Implement automated backup system for critical data

Tasks Completed:

Step 5.1: Backup Strategy Design ✅

Designed multi-tier backup retention strategy
Defined backup types: database and configuration
Established retention periods:
- Daily: 7 days
- Weekly: 28 days
- Monthly: 90 days

Step 5.2: Database Backup Implementation ✅

Created backup_postgresql role
Implemented PostgreSQL backup script with pg_dump
Configured compression (gzip) for space efficiency
Set up automated retention management
Created cron job for daily execution (2:00 AM)
Deployed to db-server

Step 5.3: Configuration Backup Implementation ✅

Created backup_configs role
Implemented backup script for system configurations:
- Nginx configurations
- Application code
- SSH configurations
- Firewall rules
- fail2ban settings
- rsyslog configurations
- Ansible infrastructure files
Configured compression and retention
Created cron job for daily execution (3:00 AM)
Deployed to web-server and app-server

Step 5.4: Backup Deployment and Testing ✅

Created backup.yml playbook
Fixed YAML syntax errors in backup roles
Deployed backup system to all servers
Created backup directories on all nodes
Verified backup scripts are executable
Tested manual backup execution
Confirmed cron jobs are scheduled
Validated backup files are created with content

Deliverables Completed:

✅ Automated database backups (PostgreSQL on db-server)
✅ Automated configuration backups (web-server, app-server)
✅ 2 new Ansible roles (backup_postgresql, backup_configs)
✅ 1 new playbook for backup deployment
✅ Multi-tier retention policy (7/28/90 days)
✅ Scheduled cron jobs for automation
✅ Backup verification capability

Backup Architecture:

┌─────────────────────────────────────────────────────────┐
│                    Backup Strategy                       │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  db-server (10.0.2.14)                                  │
│  ├── /var/backups/database/                             │
│  │   ├── daily/    (7 days retention)                   │
│  │   ├── weekly/   (28 days retention)                  │
│  │   └── monthly/  (90 days retention)                  │
│  └── Cron: Daily at 2:00 AM                             │
│                                                          │
│  web-server (10.0.2.12)                                 │
│  ├── /var/backups/configs/                              │
│  │   ├── daily/    (7 days retention)                   │
│  │   ├── weekly/   (28 days retention)                  │
│  │   └── monthly/  (90 days retention)                  │
│  └── Cron: Daily at 3:00 AM                             │
│                                                          │
│  app-server (10.0.2.13)                                 │
│  ├── /var/backups/configs/                              │
│  │   ├── daily/    (7 days retention)                   │
│  │   ├── weekly/   (28 days retention)                  │
│  │   └── monthly/  (90 days retention)                  │
│  └── Cron: Daily at 3:00 AM                             │
│                                                          │
└─────────────────────────────────────────────────────────┘

Backup Components:

1. Database Backups (db-server):

Full PostgreSQL database dump (appdb)
Backup location: /var/backups/database/
Schedule: Daily at 2:00 AM
Manual execution: sudo /usr/local/bin/backup-database.sh

2. Configuration Backups (web-server, app-server):

System and application configurations
Backup location: /var/backups/configs/
Schedule: Daily at 3:00 AM
Manual execution: sudo /usr/local/bin/backup-configs.sh

Time Invested: ~2 hours
Status: ✅ 100% Complete

markdown# PHASE 6 UPDATES - COPY THESE SECTIONS INTO YOUR README

Replace "Phase 6: Disaster Recovery ⏸️ PENDING" section with:

Phase 6: Disaster Recovery ✅ COMPLETE Objective: Develop and test disaster recovery procedures

Tasks Completed:

Step 6.1: Database Restore Tools ✅ Created restore-database.sh script for PostgreSQL Deployed to db-server (/usr/local/bin/) Interactive confirmation and validation Automatic backup decompression support Created list-db-backups.sh utility

Step 6.2: Configuration Restore Tools ✅ Created restore-configs.sh script for system configs Deployed to web-server and app-server Restores Nginx, SSH, application configurations Automatic service restart after restore Created list-config-backups.sh utility

Step 6.3: Infrastructure Rebuild Automation ✅ Created rebuild-infrastructure.yml master playbook Single command rebuilds entire infrastructure Imports all deployment playbooks in sequence Tested playbook syntax and structure

Step 6.4: Recovery Procedures Documented ✅ Defined Recovery Time Objectives (RTO):

Database: 2 hours
Web/App Servers: 30 minutes
Complete Infrastructure: 4 hours Defined Recovery Point Objectives (RPO): 24 hours (daily backups) Created disaster recovery procedures Documented restore commands and workflows

Deliverables Completed:

✅ Database restore script (restore-database.sh) ✅ Configuration restore script (restore-configs.sh) ✅ Backup listing utilities (list-db-backups.sh, list-config-backups.sh) ✅ 2 new playbooks (disaster-recovery.yml, rebuild-infrastructure.yml) ✅ RTO/RPO documentation ✅ Recovery procedures documented

Recovery Commands:

List available backups

ssh sysadmin@192.168.56.14 "sudo /usr/local/bin/list-db-backups.sh" ssh sysadmin@192.168.56.12 "sudo /usr/local/bin/list-config-backups.sh"

Restore database (DESTRUCTIVE)

ssh sysadmin@192.168.56.14 sudo /usr/local/bin/restore-database.sh /var/backups/database/daily/appdb_YYYY-MM-DD.sql.gz

Restore configurations (DESTRUCTIVE)

ssh sysadmin@192.168.56.12 sudo /usr/local/bin/restore-configs.sh /var/backups/configs/daily/configs_YYYY-MM-DD.tar.gz

Complete infrastructure rebuild

cd ~/infrastructure ansible-playbook playbooks/rebuild-infrastructure.yml

Time Invested: ~3 hours Status: ✅ 100% Complete

📊 Current Progress

Overall Project Status

Phase 1: ████████████████████████████████ 100% ✅ COMPLETE
Phase 2: ████████████████████████████████ 100% ✅ COMPLETE
Phase 3: ████████████████████████████████ 100% ✅ COMPLETE
Phase 4: ████████████████████████████████ 100% ✅ COMPLETE
Phase 5: ████████████████████████████████ 100% ✅ COMPLETE
Phase 5: ████████████████████████████████ 100% ✅ COMPLETE
═══════════════════════════════════════════════════════════
Overall: ████████████████████████████████  100% Complete

Phase 2 Achievement Summary

Infrastructure Automated:

✅ 4 VMs fully configured via Ansible
✅ 8 reusable Ansible roles created
✅ 6 functional playbooks developed
✅ Complete 3-tier application stack deployed
✅ Zero manual configuration required
✅ Full idempotency verified

Services Deployed:

✅ Nginx reverse proxy (web-server)
✅ Express.js application (app-server)
✅ PostgreSQL 16 database (db-server)
✅ Security hardening (all servers)
✅ Monitoring agents (all servers)

Testing Results:

✅ All playbooks execute successfully
✅ Idempotency confirmed (safe to re-run)
✅ End-to-end connectivity verified
✅ All services healthy and responsive
✅ Security measures active and tested

Phase 3 Achievement Summary

Monitoring Infrastructure Deployed:

✅ Prometheus 3.9.1 installed and configured on control-node
✅ Grafana 12.3.2 installed and configured on control-node
✅ All 4 servers being monitored (100% coverage)
✅ Real-time metrics collection every 15 seconds
✅ Dashboard visualization with multiple views
✅ Data source integration tested and working

New Ansible Components:

✅ roles/prometheus/ - Complete Prometheus role
✅ roles/grafana/ - Complete Grafana role
✅ playbooks/monitoring.yml - Monitoring stack deployment
✅ playbooks/verify-monitoring.yml - Monitoring validation
✅ playbooks/open-monitoring-ports.yml - Firewall configuration

Dashboards Created:

✅ "System Monitoring Dashboard" - Comprehensive metrics view
✅ "SIMPLE TEST - RAW METRICS" - Debug/verification dashboard
✅ "GUARANTEED WORKING - TABLE VIEW" - Tabular data display
✅ "GUARANTEED WORKING - STAT VIEW" - Stat panel dashboard

Testing Results:

✅ Prometheus scraping all 4 targets (all "UP")
✅ Grafana can query Prometheus successfully
✅ Dashboard panels showing real-time data
✅ All services healthy and responsive
✅ Firewall rules properly configured

Time Investment

Phase 1: 4 hours ✅
Phase 2: 11 hours ✅
Phase 3: 6 hours ✅
Phase 4: 2 hours ✅
Phase 5: 6 hours ✅
Total so far: 25 hours
Estimated remaining: 15-20 hours

Last Updated

Date: February73, 2026
Current Phase: Phase 4-5 - Complete ✅

🔐 Security Implementations

Multi-Layered Security Approach

1. SSH Hardening ✅

✅ Key-based authentication only (password authentication disabled)
✅ Root login disabled via SSH
✅ Public key authentication configured for sysadmin user
✅ MaxAuthTries: Limited to 3 attempts
✅ Automated via Ansible (ssh_hardening role)

Configuration File: /etc/ssh/sshd_config

2. Firewall (UFW) ✅

✅ Default Policy: Deny incoming, Allow outgoing
✅ Service-Specific Rules:
- SSH (22/tcp) - Management access
- HTTP (80/tcp) - Web server only
- HTTPS (443/tcp) - Web server only
- App (3000/tcp) - From web server only
- PostgreSQL (5432/tcp) - From app server only
- node_exporter (9100/tcp) - Internal network only
- Syslog (514/udp, 514/tcp) - Internal network only
✅ Automated via Ansible (firewall role)

Check Status: sudo ufw status verbose

3. Intrusion Prevention (fail2ban) ✅

✅ Monitoring: SSH login attempts
✅ Max Retries: 3 failed attempts
✅ Ban Time: 3600 seconds (1 hour)
✅ Find Time: 600 seconds (10 minutes)
✅ Automatic IP banning after threshold exceeded
✅ Automated via Ansible (fail2ban role)

Configuration File: /etc/fail2ban/jail.local
Check Status: sudo fail2ban-client status sshd

4. Automatic Security Updates ✅

✅ Service: unattended-upgrades
✅ Update Type: Security updates only
✅ Auto-reboot: Disabled (manual control)
✅ Old Kernel Cleanup: Enabled
✅ Daily Update Check: Automated
✅ Automated via Ansible (auto_updates role)

Configuration File: /etc/apt/apt.conf.d/50unattended-upgrades

5. System Monitoring ✅

✅ Agent: Prometheus node_exporter v1.8.2
✅ Metrics Port: 9100
✅ Metrics Collected:
- CPU usage and load averages
- Memory and swap utilization
- Disk space and I/O
- Network traffic and errors
- System uptime and processes
✅ Automated via Ansible (node_exporter role)

Access Metrics: curl http://localhost:9100/metrics

6. Network Segmentation ✅

✅ Web Tier: Internet-facing (ports 80, 443)
✅ App Tier: Only accessible from web server
✅ Database Tier: Only accessible from app server
✅ Management: SSH restricted via firewall rules

🛠️ Technologies Used

Operating Systems & Virtualization

Host OS: Windows 11 Pro
Hypervisor: Oracle VirtualBox 7.x
Guest OS: Linux Mint 22 Wilma (based on Ubuntu 24.04 LTS)
Kernel: Linux 6.8.x

Automation & Configuration Management

Ansible: 2.16+ (automation platform)
YAML: Configuration and playbook syntax
Jinja2: Template engine for dynamic configurations
Git: Version control
GitHub: Remote repository

Security Tools

OpenSSH: Secure remote access
UFW: Firewall management (frontend for iptables)
fail2ban: Intrusion prevention system
unattended-upgrades: Automatic security patching

Application Stack

Nginx: Reverse proxy and web server
Node.js: JavaScript runtime (v18.x)
Express.js: Web application framework
PostgreSQL: Relational database (v16)

Monitoring & Observability

Prometheus node_exporter: Metrics collection agent (v1.8.2)
Prometheus: Time-series database and alerting (v3.9.1)
Grafana: Visualization and dashboards (v12.3.2)

Logging & Backup

rsyslog: Centralized log management
logrotate: Log rotation and retention
pg_dump: PostgreSQL backup utility
cron: Job scheduling for automated backups
Bash: Backup and maintenance scripts

🚀 Quick Start

Prerequisites

Hardware: 16GB RAM minimum, 4+ CPU cores recommended
Software:
- VirtualBox 7.x or later
- Windows 10/11 (or any OS supporting VirtualBox)
- SSH client (built into Windows 10+)
- Git (for version control)

Access the Infrastructure

SSH from Windows (PowerShell)

# Access Ansible control node
ssh sysadmin@192.168.56.11

# Access web server
ssh sysadmin@192.168.56.12

# Access app server
ssh sysadmin@192.168.56.13

# Access database server
ssh sysadmin@192.168.56.14

Run Ansible Playbooks (from control-node)

# SSH into control node
ssh sysadmin@192.168.56.11

# Navigate to infrastructure directory
cd ~/infrastructure

# Test connectivity to all managed nodes
ansible managed_nodes -m ping

# Apply complete infrastructure automation
ansible-playbook playbooks/base-hardening.yml

# Deploy web server (Nginx)
ansible-playbook playbooks/web-server.yml

# Deploy application server (Node.js)
ansible-playbook playbooks/app-server.yml

# Deploy database server (PostgreSQL)
ansible-playbook playbooks/db-server.yml

# Deploy monitoring stack (Prometheus + Grafana)
ansible-playbook playbooks/monitoring.yml

# Deploy centralized logging
ansible-playbook playbooks/logging.yml

# Deploy backup system
ansible-playbook playbooks/backup.yml

# Verify all configurations and services
ansible-playbook playbooks/verify-config.yml
ansible-playbook playbooks/verify-all-services.yml
ansible-playbook playbooks/verify-monitoring.yml
ansible-playbook playbooks/verify-logging.yml

# Run in check mode (dry run - no changes)
ansible-playbook playbooks/base-hardening.yml --check

# Run with verbose output for troubleshooting
ansible-playbook playbooks/base-hardening.yml -vvv

Access Monitoring Dashboards

# From Windows browser:
# Prometheus: http://192.168.56.11:9090
# Grafana:    http://192.168.56.11:3001
#
# Grafana Credentials:
#   Username: admin
#   Password: admin123!

Check Centralized Logs

# From control-node, view centralized logs
# Web server logs
sudo tail -f /var/log/remote/web-server/syslog

# App server logs
sudo tail -f /var/log/remote/app-server/syslog

# Database server logs
sudo tail -f /var/log/remote/db-server/syslog

# View all logs
sudo ls -lh /var/log/remote/*/

Verify Backups

# Check backup directories exist
ansible all -m shell -a "ls -lh /var/backups/" -b

# Check database backup files
ansible db_servers -m shell -a "ls -lh /var/backups/database/daily/" -b

# Check configuration backup files
ansible managed_nodes -m shell -a "ls -lh /var/backups/configs/daily/" -b

# Verify cron jobs are scheduled
ansible all -m shell -a "crontab -l 2>/dev/null | grep backup || echo 'No backup cron jobs'" -b

# Manually trigger a test backup
# Database backup
ansible db_servers -m shell -a "/usr/local/bin/backup-database.sh" -b

# Configuration backup
ansible web_servers -m shell -a "/usr/local/bin/backup-configs.sh" -b

Test the Application Stack

# From Windows, test the web server
curl http://192.168.56.12

# From control-node, test app server health
ansible app_servers -m shell -a "curl -s http://localhost:3000/health"

# Test database connectivity
ansible app_servers -m shell -a 'PGPASSWORD=SecurePassword123\! psql -h 10.0.2.14 -U appuser -d appdb -c "SELECT version();"' -b

# Test end-to-end flow (Web → App → Database)
curl http://192.168.56.12/health

Project Structure

infrastructure/
│
├── ansible.cfg                      # Ansible configuration file
├── .gitignore                       # Git ignore rules (secrets, keys)
├── README.md                        # This comprehensive documentation
│
├── inventory/
│   └── hosts.yml                    # Server inventory with host groups
│
├── group_vars/
│   └── all.yml                      # Global variables for all hosts
│
├── playbooks/                       # Ansible playbooks (automation scripts)
│   ├── base-hardening.yml           # Security hardening for all servers
│   ├── web-server.yml               # Nginx reverse proxy deployment
│   ├── app-server.yml               # Node.js application deployment
│   ├── db-server.yml                # PostgreSQL database deployment
│   ├── verify-config.yml            # Individual service verification
│   ├── verify-all-services.yml      # End-to-end testing
│   ├── monitoring.yml               # Monitoring stack deployment
│   ├── verify-monitoring.yml        # Monitoring validation
│   ├── open-monitoring-ports.yml    # Firewall for monitoring
│   ├── logging.yml                  # 📋 Centralized logging deployment
│   ├── verify-logging.yml           # 📋 Logging validation
│   └── backup.yml                   # 💾 Backup system deployment
│
├── roles/                           # Ansible roles (reusable components)
│   ├── ssh_hardening/               # SSH security configuration
│   ├── firewall/                    # UFW firewall configuration
│   ├── fail2ban/                    # Intrusion prevention
│   ├── auto_updates/                # Automatic security updates
│   ├── node_exporter/               # Monitoring agent
│   ├── nginx/                       # Web server and reverse proxy
│   ├── nodejs_app/                  # Node.js application
│   ├── postgresql/                  # PostgreSQL database
│   ├── prometheus/                  # Prometheus monitoring
│   ├── grafana/                     # Grafana visualization
│   ├── rsyslog_server/              # 📋 Centralized log server
│   ├── rsyslog_client/              # 📋 Log forwarding client
│   ├── backup_postgresql/           # 💾 Database backup automation
│   └── backup_configs/              # 💾 Configuration backup automation
│
└── files/                           # Static files (future use)
    └── scripts/

📚 Command Reference

Phase 1: Manual Base Configuration Commands

Click to expand Phase 1 commands

Initial Network Configuration (baseline-template)

# Set static IP on NAT Network (enp0s3)
sudo nmcli connection modify "Wired connection 1" \
  ipv4.addresses 10.0.2.10/24 \
  ipv4.gateway 10.0.2.1 \
  ipv4.dns "8.8.8.8,8.8.4.4" \
  ipv4.method manual

# Apply changes
sudo nmcli connection down "Wired connection 1"
sudo nmcli connection up "Wired connection 1"

# Set static IP on Host-Only Network (enp0s8)
sudo nmcli connection add type ethernet ifname enp0s8 con-name host-only \
  ipv4.addresses 192.168.56.10/24 \
  ipv4.method manual

# Activate Host-Only connection
sudo nmcli connection up host-only

# Verify network configuration
ip addr show
ip route show

Hostname Configuration

# Set hostname
sudo hostnamectl set-hostname baseline-template

# Edit /etc/hosts for proper resolution
sudo nano /etc/hosts
# Change 127.0.1.1 line to: 127.0.1.1  baseline-template

# Verify
hostname
hostnamectl

System Updates

# Update package lists
sudo apt update

# Upgrade all packages
sudo apt upgrade -y

# Install essential tools
sudo apt install -y vim curl wget git net-tools ufw fail2ban openssh-server

# Reboot to apply updates
sudo reboot

SSH Configuration

# Install OpenSSH server (usually pre-installed)
sudo apt install -y openssh-server

# Enable and start SSH service
sudo systemctl enable ssh
sudo systemctl start ssh

# Create SSH directory and set permissions
mkdir -p ~/.ssh
chmod 700 ~/.ssh

# Add your public key to authorized_keys (paste your key)
nano ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Backup original SSH configuration
sudo cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup

# Edit SSH configuration
sudo nano /etc/ssh/sshd_config
# Set these values:
#   PermitRootLogin no
#   PasswordAuthentication no
#   PubkeyAuthentication yes
#   AuthorizedKeysFile .ssh/authorized_keys
#   MaxAuthTries 3

# Test configuration syntax
sudo sshd -t

# Restart SSH service
sudo systemctl restart ssh

# Verify SSH status
sudo systemctl status ssh

Firewall (UFW) Configuration

# Set default policies
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH
sudo ufw allow 22/tcp

# Enable firewall
sudo ufw enable

# Check status
sudo ufw status verbose
sudo ufw status numbered

fail2ban Installation and Configuration

# Install fail2ban
sudo apt install -y fail2ban

# Copy default configuration
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local

# Edit configuration
sudo nano /etc/fail2ban/jail.local
# Configure [sshd] section:
#   enabled = true
#   port = 22
#   maxretry = 3
#   bantime = 3600
#   findtime = 600

# Enable and start fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban

# Check status
sudo systemctl status fail2ban
sudo fail2ban-client status
sudo fail2ban-client status sshd

Automatic Security Updates

# Install unattended-upgrades
sudo apt install -y unattended-upgrades apt-listchanges

# Configure unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades

# Edit configuration file
sudo nano /etc/apt/apt.conf.d/50unattended-upgrades
# Ensure security updates are enabled:
#   "${distro_id}:${distro_codename}-security";

# Enable and start service
sudo systemctl enable unattended-upgrades
sudo systemctl start unattended-upgrades

# Check status
sudo systemctl status unattended-upgrades

node_exporter Installation

# Download node_exporter
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz

# Extract
tar xvfz node_exporter-1.8.2.linux-amd64.tar.gz

# Move binary to system path
sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/

# Create system user
sudo useradd --no-create-home --shell /bin/false node_exporter

# Set ownership
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

# Create systemd service
sudo nano /etc/systemd/system/node_exporter.service
# Paste service configuration

# Reload systemd, enable and start service
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

# Verify status
sudo systemctl status node_exporter

# Allow node_exporter through firewall (internal network only)
sudo ufw allow from 10.0.2.0/24 to any port 9100 proto tcp

# Test metrics endpoint
curl http://localhost:9100/metrics | head -20

Phase 2: Ansible Automation Commands

Click to expand Phase 2 commands

Initial Setup on Control-Node

# SSH into control-node from Windows
ssh sysadmin@192.168.56.11

# Update system
sudo apt update && sudo apt upgrade -y

# Install Ansible
sudo apt install -y ansible

# Verify installation
ansible --version

Project Structure Creation

# Create main project directory
mkdir -p ~/infrastructure
cd ~/infrastructure

# Create subdirectories
mkdir -p inventory playbooks roles group_vars host_vars files templates

# Create initial files
touch ansible.cfg
touch inventory/hosts.yml
touch group_vars/all.yml

SSH Key Setup for Ansible

# Generate SSH key pair for Ansible (on control-node)
ssh-keygen -t ed25519 -C "ansible-control"
# Press ENTER for all prompts

# View the public key
cat ~/.ssh/id_ed25519.pub

# Copy SSH key to each managed node
ssh-copy-id sysadmin@web-server
ssh-copy-id sysadmin@app-server
ssh-copy-id sysadmin@db-server

# Test passwordless SSH
ssh sysadmin@web-server "hostname"
ssh sysadmin@app-server "hostname"
ssh sysadmin@db-server "hostname"

Test Ansible Connectivity

# Ping all managed nodes
ansible managed_nodes -m ping

# Check hostname
ansible managed_nodes -m command -a "hostname"

# Check uptime
ansible managed_nodes -m command -a "uptime"

# Test sudo access
ansible managed_nodes -m command -a "sudo whoami"

Deploy Service-Specific Playbooks

# Deploy web server (Nginx)
ansible-playbook playbooks/web-server.yml

# Deploy application server (Node.js)
ansible-playbook playbooks/app-server.yml

# Deploy database server (PostgreSQL)
ansible-playbook playbooks/db-server.yml

# Verify all services
ansible-playbook playbooks/verify-all-services.yml

Phase 3: Monitoring Implementation Commands

Click to expand Phase 3 commands

Deploy Monitoring Stack

# Deploy Prometheus and Grafana
ansible-playbook playbooks/monitoring.yml

# Verify the monitoring stack
ansible-playbook playbooks/verify-monitoring.yml

Check Prometheus Status

# Check Prometheus service
sudo systemctl status prometheus

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets | python3 -m json.tool

# Test Prometheus queries
curl "http://localhost:9090/api/v1/query?query=up" | python3 -m json.tool

Check Grafana Status

# Check Grafana service
sudo systemctl status grafana-server

# Test Grafana API
curl http://localhost:3001/api/health | python3 -m json.tool

Phase 4: Centralized Logging Commands

Click to expand Phase 4 commands

Deploy Logging Infrastructure

# SSH into control-node
ssh sysadmin@192.168.56.11
cd ~/infrastructure

# Deploy centralized logging
ansible-playbook playbooks/logging.yml

# Verify logging setup
ansible-playbook playbooks/verify-logging.yml

Check rsyslog Server (on control-node)

# Check rsyslog service
sudo systemctl status rsyslog

# View rsyslog configuration
sudo cat /etc/rsyslog.d/50-remote.conf

# Check if rsyslog is listening on port 514
sudo netstat -tulpn | grep rsyslog
sudo ss -tulpn | grep 514

# View centralized logs
sudo ls -lh /var/log/remote/

# View logs from specific server
sudo tail -f /var/log/remote/web-server/syslog
sudo tail -f /var/log/remote/app-server/syslog
sudo tail -f /var/log/remote/db-server/syslog

Check rsyslog Client (on managed nodes)

# Check rsyslog service on all nodes
ansible managed_nodes -m systemd -a "name=rsyslog state=started enabled=yes" -b

# View rsyslog client configuration
ansible managed_nodes -m shell -a "cat /etc/rsyslog.d/50-forward.conf" -b

# Test log forwarding
ansible web_servers -m shell -a "logger 'Test log message from web-server'" -b

# Verify test message appeared on control-node
sudo grep "Test log message" /var/log/remote/web-server/syslog

Troubleshoot Logging Issues

# Check firewall allows syslog
ansible control -m shell -a "sudo ufw status | grep 514" -b

# Restart rsyslog on all nodes
ansible all -m systemd -a "name=rsyslog state=restarted" -b

# Check rsyslog errors
ansible all -m shell -a "sudo journalctl -u rsyslog -n 20" -b

# Test connectivity from client to server
ansible managed_nodes -m shell -a "nc -zv 10.0.2.11 514" -b

Phase 5: Backup Automation Commands

Click to expand Phase 5 commands

Deploy Backup System

# SSH into control-node
ssh sysadmin@192.168.56.11
cd ~/infrastructure

# Deploy backup automation
ansible-playbook playbooks/backup.yml

Verify Backup Configuration

# 1. Verify backup directories exist
ansible all -m shell -a "ls -lh /var/backups/" -b

# 2. Check actual backup files were created
ansible db_servers -m shell -a "ls -lh /var/backups/database/daily/" -b
ansible managed_nodes -m shell -a "ls -lh /var/backups/configs/daily/" -b

# 3. Verify cron jobs are scheduled
ansible all -m shell -a "crontab -l 2>/dev/null | grep backup || echo 'No backup cron jobs'" -b

# 4. Check backup file sizes to confirm they have content
ansible db_servers -m shell -a "du -sh /var/backups/database/daily/* 2>/dev/null | head -3" -b

Manual Backup Execution

# Manually trigger database backup
ansible db_servers -m shell -a "/usr/local/bin/backup-database.sh" -b

# Manually trigger configuration backup
ansible web_servers -m shell -a "/usr/local/bin/backup-configs.sh" -b
ansible app_servers -m shell -a "/usr/local/bin/backup-configs.sh" -b

# View backup logs
ansible db_servers -m shell -a "tail -20 /var/backups/logs/backup.log" -b
ansible web_servers -m shell -a "tail -20 /var/backups/logs/backup.log" -b

Check Backup Status

# Database backups on db-server
ssh sysadmin@192.168.56.14
sudo ls -lh /var/backups/database/daily/
sudo ls -lh /var/backups/database/weekly/
sudo ls -lh /var/backups/database/monthly/

# Configuration backups on web-server
ssh sysadmin@192.168.56.12
sudo ls -lh /var/backups/configs/daily/
sudo ls -lh /var/backups/configs/weekly/
sudo ls -lh /var/backups/configs/monthly/

# Check cron schedule
crontab -l | grep backup

Test Backup Restoration (for future Phase 6)

# To restore database backup (example for Phase 6):
# sudo -u postgres pg_restore -d appdb /var/backups/database/daily/backup-YYYYMMDD.sql.gz

# To restore configuration files (example for Phase 6):
# sudo tar -xzf /var/backups/configs/daily/backup-YYYYMMDD.tar.gz -C /

Common Troubleshooting Commands

Click to expand troubleshooting commands

Network Issues

# Check IP addresses
ip addr show

# Check routing table
ip route show

# Test connectivity
ping -c 4 8.8.8.8
ping -c 4 google.com
ping -c 4 web-server

# Check open ports
sudo netstat -tulpn
sudo ss -tulpn

# Test specific port
nc -zv hostname port

SSH Issues

# Check SSH service status
sudo systemctl status ssh

# View SSH logs
sudo journalctl -u ssh -n 50
sudo tail -f /var/log/auth.log

# Test SSH configuration
sudo sshd -t

# Debug SSH connection
ssh -vvv sysadmin@hostname

Firewall Issues

# Check UFW status
sudo ufw status verbose
sudo ufw status numbered

# View UFW logs
sudo tail -f /var/log/ufw.log

Service Management

# Check service status
sudo systemctl status SERVICE_NAME

# Start/Stop/Restart service
sudo systemctl start SERVICE_NAME
sudo systemctl stop SERVICE_NAME
sudo systemctl restart SERVICE_NAME

# Enable/Disable on boot
sudo systemctl enable SERVICE_NAME
sudo systemctl disable SERVICE_NAME

# View service logs
sudo journalctl -u SERVICE_NAME -n 50
sudo journalctl -u SERVICE_NAME -f

🎓 Skills Demonstrated

This project showcases a comprehensive set of skills valued in DevOps, Cloud Engineering, and System Administration roles:

Technical Skills

Linux System Administration:

Server installation and configuration
Network configuration and troubleshooting
User and permission management
Service management with systemd
Package management (apt)
Log analysis and troubleshooting

Security & Hardening:

SSH key-based authentication
Firewall configuration (UFW/iptables)
Intrusion detection and prevention
Security patch management
Principle of least privilege
Network segmentation
Security auditing

Infrastructure as Code (IaC):

Ansible playbook development
Role-based architecture
Idempotent configuration
Template management (Jinja2)
Variable management
Inventory organization
Multi-tier application deployment

Automation & Scripting:

Bash scripting
Ansible automation
Configuration management
Automated deployment
Service orchestration
Cron job scheduling

Application Deployment:

Reverse proxy configuration (Nginx)
Application server setup (Node.js/Express)
Database deployment (PostgreSQL)
Service integration
Health check implementation

Monitoring & Observability:

Metrics collection (node_exporter)
Service monitoring (Prometheus)
Performance tracking and visualization (Grafana)
Dashboard creation and customization
Time-series data analysis
Infrastructure observability
Real-time monitoring implementation

Logging & Auditing:

Centralized log management (rsyslog)
Log forwarding configuration
Log rotation and retention policies
Log analysis and troubleshooting

Backup & Recovery:

Automated backup strategies
Database backup automation (pg_dump)
Configuration backup procedures
Multi-tier retention policies
Backup verification and testing
Disaster recovery planning

Version Control:

Git workflow
Repository management
Commit best practices
Documentation maintenance
Change tracking

Networking:

TCP/IP fundamentals
DNS configuration
Firewall rules
Port management
Network troubleshooting
Multi-network setup (NAT + Host-Only)

Soft Skills

Documentation:

Clear, comprehensive README
Command references
Architecture diagrams
Troubleshooting guides
Progressive documentation

Problem Solving:

Systematic debugging
Root cause analysis
Solution documentation
Iterative improvement

Project Management:

Phase-based approach
Progress tracking
Time estimation
Milestone achievement
Scope management

Professional Development:

Self-directed learning
Following best practices
Continuous improvement
Knowledge sharing

🔮 Future Enhancements

Potential additions to expand the project:

Immediate Next Steps (Phase 6)

Database restore procedures
Configuration restore testing
Full infrastructure rebuild automation
RTO/RPO documentation
Disaster recovery playbooks

Infrastructure Improvements

High Availability (HA) setup with keepalived
Load balancing with HAProxy or Nginx
Containerization with Docker
Container orchestration with Kubernetes (K3s)
Service mesh implementation (Istio/Linkerd)

Security Enhancements

VPN setup (OpenVPN/WireGuard)
Certificate management with Let's Encrypt
Web Application Firewall (ModSecurity)
Security scanning (Lynis, OpenVAS)
Compliance automation (CIS benchmarks)
Vulnerability scanning

Monitoring & Alerting

CI/CD Pipeline

Cloud Migration

🤝 Contributing

This is a personal portfolio project, but feedback and suggestions are always welcome!

How to Provide Feedback

Issues: Open an issue on GitHub for bugs or suggestions
Discussions: Start a discussion for questions or ideas
Pull Requests: Not accepting PRs as this is a learning project, but appreciate the interest!

Learning Resources

If you're building a similar project, here are helpful resources:

Ansible:

Linux:

DevOps:

Prometheus & Grafana:

Logging & Backup:

📄 License

This project is open source and available under the MIT License.

Feel free to use this project as a template or reference for your own learning!

👤 Author

Skander Ba

🌐 GitHub: @Skanderba8
📧 Email: baskander5@gmail.com
💼 LinkedIn: Skander Ben Abdallah
🌟 Portfolio: Website

📞 Contact & Feedback

Questions About This Project?

GitHub Issues: For bugs or technical questions
GitHub Discussions: For general questions and ideas
Email: For collaboration or opportunities

Acknowledgments

Special thanks to:

The Ansible community for excellent documentation
The Linux community for amazing tools and support
The Prometheus and Grafana teams for outstanding monitoring tools
The rsyslog and PostgreSQL communities for robust logging and database solutions
Everyone who provides feedback and suggestions

📈 Project Stats

Started: January 30, 2026
Phase 1 Complete: January 31, 2026
Phase 2 Complete: February 3, 2026
Phase 3 Complete: February 3, 2026
Phase 4 Complete: February 7, 2026
Phase 5 Complete: February 9, 2026
Phase 6 Complete: February 12, 2026
Current Status: Phase 6 Complete - Project completed
Total Commits: Check GitHub for latest count
Lines of Ansible Code: ~2,000+
Documentation Pages: 1 (comprehensive README)
Services Deployed: 3-tier application stack + Monitoring + Logging + Backups
Ansible Roles: 14 (security + services + monitoring + logging + backup)
Playbooks: 11 (deployment + verification)

🏆 Milestones Achieved

✅ 2026-01-30: Project initiated, Phase 1 planning complete
✅ 2026-01-31: Phase 1 complete - All VMs configured manually
✅ 2026-02-02: Ansible installed, SSH keys distributed, passwordless sudo configured
✅ 2026-02-02: Git repository initialized and pushed to GitHub
✅ 2026-02-02: Created all base security hardening roles
✅ 2026-02-02: Base-hardening playbook tested and verified
✅ 2026-02-03: Web server (Nginx) deployed successfully
✅ 2026-02-03: Application server (Node.js) deployed successfully
✅ 2026-02-03: Database server (PostgreSQL) deployed successfully
✅ 2026-02-03: Phase 2 COMPLETE - Full automation verified
✅ 2026-02-03: Prometheus deployed and configured
✅ 2026-02-03: Grafana deployed and configured
✅ 2026-02-03: Monitoring dashboards created and tested
✅ 2026-02-03: Phase 3 COMPLETE - Centralized monitoring operational
✅ 2026-02-07: rsyslog server configured on control-node
✅ 2026-02-07: Log forwarding from all managed nodes working
✅ 2026-02-07: Phase 4 COMPLETE - Centralized logging operational
✅ 2026-02-07: Database backup automation deployed
✅ 2026-02-07: Configuration backup automation deployed
✅ 2026-02-07: Multi-tier retention policy implemented
✅ 2026-02-07: Phase 5 COMPLETE - Automated backups operational
✅ 2026-02-12: Phase 6 COMPLETE - Completed Project

📊 Infrastructure Health Status

Last Verified: February 7, 2026

Component	Status	Health Check
Web Server (Nginx)	🟢 Running	✓ HTTP responding
App Server (Node.js)	🟢 Running	✓ Health endpoint OK
Database (PostgreSQL)	🟢 Running	✓ Accepting connections
SSH Security	🟢 Active	✓ Key-only auth
Firewall (UFW)	🟢 Active	✓ Rules enforced
fail2ban	🟢 Active	✓ Monitoring SSH
Auto Updates	🟢 Configured	✓ Security patches enabled
Node Exporter	🟢 Running	✓ Metrics available (all 4 servers)
Prometheus	🟢 Running	✓ All 4 targets UP ✅
Grafana	🟢 Running	✓ Dashboards operational ✅
Centralized Logging	🟢 Running	✓ All nodes forwarding logs ✅
Database Backups	🟢 Scheduled	✓ Cron job active (2:00 AM) ✅
Config Backups	🟢 Scheduled	✓ Cron jobs active (3:00 AM) ✅
End-to-End Connectivity	🟢 Verified	✓ Web→App→DB working
Full Stack	🟢 Verified	✓ Complete observability & backup ✅

Last Updated: February 12, 2026
README Version: 5.0
Status: Living Document - Updated as project progresses

If you found this project helpful or interesting, please consider giving it a ⭐ on GitHub!

⬆ Back to Top

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📋 Table of Contents

🎯 Project Overview

What This Project Demonstrates

Project Goals

🏗️ Infrastructure Architecture

Environment Specifications

Server Topology

Network Architecture

Application Flow

📈 Project Phases

Phase 1: Manual Base Configuration ✅ COMPLETE

Phase 2: Automation with Ansible ✅ COMPLETE

Step 2.1: Ansible Control Node Setup ✅

Step 2.2: Base Security Hardening Automation ✅

Step 2.3: Service-Specific Playbooks ✅

Step 2.4: End-to-End Testing ✅

Phase 3: Centralized Monitoring ✅ COMPLETE

Step 3.1: Prometheus Deployment ✅

Step 3.2: Grafana Deployment ✅

Step 3.3: Dashboard Implementation ✅

Step 3.4: Playbook Development ✅

Phase 4: Centralized Logging ⏸️ PENDING

Step 4.1: Log Server Setup ✅

Step 4.2: Log Client Configuration ✅

Step 4.3: Log Management ✅

Step 4.4: Testing and Verification ✅

Phase 5: Backup Automation ✅ COMPLETE

Step 5.1: Backup Strategy Design ✅

Step 5.2: Database Backup Implementation ✅

Step 5.3: Configuration Backup Implementation ✅

Step 5.4: Backup Deployment and Testing ✅

Replace "Phase 6: Disaster Recovery ⏸️ PENDING" section with:

List available backups

Restore database (DESTRUCTIVE)

Restore configurations (DESTRUCTIVE)

Complete infrastructure rebuild

📊 Current Progress

Overall Project Status

Phase 2 Achievement Summary

Phase 3 Achievement Summary

Time Investment

Last Updated

🔐 Security Implementations

Multi-Layered Security Approach

1. SSH Hardening ✅

2. Firewall (UFW) ✅

3. Intrusion Prevention (fail2ban) ✅

4. Automatic Security Updates ✅

5. System Monitoring ✅

6. Network Segmentation ✅

🛠️ Technologies Used

Operating Systems & Virtualization

Automation & Configuration Management

Security Tools

Application Stack

Monitoring & Observability

Logging & Backup

🚀 Quick Start

Prerequisites

Access the Infrastructure

SSH from Windows (PowerShell)

Run Ansible Playbooks (from control-node)

Access Monitoring Dashboards

Check Centralized Logs

Verify Backups

Test the Application Stack

Project Structure

📚 Command Reference

Phase 1: Manual Base Configuration Commands

Initial Network Configuration (baseline-template)

Hostname Configuration

System Updates

SSH Configuration

Firewall (UFW) Configuration