Production-style monitoring platform for HTTP services, inspired by systems like Datadog, New Relic, and UptimeRobot.
This project demonstrates how to design and implement a real-world monitoring system with background workers, alerting semantics, historical metrics, and a modern dashboard — focusing on observability, reliability, and testability.
- JWT-based authentication
- Monitor CRUD with strict ownership enforcement
- Secure, multi-user architecture
- Background scheduler (async, non-blocking)
- Periodic HTTP health checks
- Response time & availability tracking
- Time-series check run storage
- DOWN alerts after consecutive failures
- RECOVERY alerts only after confirmed DOWN
- Guaranteed semantics:
- No duplicate DOWN alerts
- RECOVERY only after an actual outage
- Monitors list & details
- Check history
GET /monitors/:id/checks - Alerts
GET /alerts - Summary statistics
GET /monitors/:id/summary?windowHours=24
- Authentication flow (login / register)
- Protected routes
- Monitors overview
- Monitor details:
- Uptime summary
- Latency & availability charts
- Check history table
- Alerts page with polling and filtering
apps/
├── api
│ ├── modules
│ │ ├── auth
│ │ ├── monitors
│ │ ├── checkruns
│ │ └── alerts
│ ├── engine
│ │ ├── monitoringEngine.ts
│ │ ├── httpCheck.ts
│ │ └── alertRules.ts
│ ├── middleware
│ └── config
│ └── web
├── pages
├── ui
└── api
Backend
- Node.js + TypeScript
- Express
- MongoDB (time-series style collections)
- Background monitoring engine (in-process workers)
Frontend
- React + Vite
- Tailwind CSS
- React Query
- Recharts
Infrastructure
- Docker & Docker Compose
- MongoDB container
- CI-ready setup
- Node.js ≥ 18
- Docker Desktop (or Docker Engine)
From the repository root:
docker compose up --buildBackend API:
http://localhost:4000
Health check:
GET http://localhost:4000/health
In a new terminal:
cd apps/web
npm install
npm run dev
Frontend
http://localhost:5173
Useful endpoints to demonstrate alerting and recovery:
table uptime https://www.google.com
Real API https://api.github.com
Status failure https://httpstat.us/500
Timeout https://httpstat.us/200?sleep=10000
Guaranteed DOWN http://127.0.0.1:1
Alerting logic extracted into pure functions for deterministic testing
Monitoring engine decoupled from request lifecycle
Ownership enforced at query level (no cross-user data leakage)
Time-series data modeled explicitly (check runs)
Dockerized backend for reproducible execution
Unit tests for alerting rules
Integration tests for API endpoints
In-memory MongoDB for deterministic tests
Frontend tested via component & query-level testing
MIT License © Ali Romia
Ali Romia - Software Engineer
GitHub: https://github.com/Aliromia21
LinkedIn: https://www.linkedin.com/in/aliromia/