This document describes the architecture of the Todo App, a full-stack application with Azure infrastructure, monitoring, and AI-powered end-to-end testing with GitHub Copilot custom agents.
┌─────────────────────────────────────────────────────────────────┐
│ GitHub Actions │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Frontend │ │ Backend │ │ Infrastructure │ │
│ │ Deploy │ │ Deploy │ │ Deploy + Drift │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Azure Cloud Platform │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Azure Static Web Apps (Frontend) │ │
│ │ React + Vite + TypeScript │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ HTTPS │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Azure App Service (Backend API) │ │
│ │ Node.js + Express + TypeScript + Prisma │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ PostgreSQL │ │ Azure Redis │ │ Application │ │
│ │ Flexible Server│ │ Cache │ │ Insights │ │
│ │ │ │ │ │ │ │
│ │ - Todo Data │ │ - Session Cache│ │ - Telemetry │ │
│ │ - Metadata │ │ - Todo Cache │ │ - Logs │ │
│ │ - Tags │ │ - Rate Limit │ │ - Metrics │ │
│ └────────────────┘ └─────────────────┘ │ - Traces │ │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Log Analytics Workspace │ │
│ │ │ │
│ │ - Kusto Query Language (KQL) queries │ │
│ │ - Custom dashboards │ │
│ │ - Alert rule evaluation │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Azure Monitor Alerts │ │
│ │ │ │
│ │ - CPU > 80% │ │
│ │ - Memory > 85% │ │
│ │ - HTTP 5xx > 10/min │ │
│ │ - Response Time > 2s │ │
│ │ - Database Connection Failures │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Monitor Action Group │ │
│ │ │ │
│ │ - Email notifications │ │
│ │ - Creates GitHub Issues (via Azure Monitor) │ │
│ │ - Triggers automated remediation │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Azure Key Vault │ │
│ │ │ │
│ │ - Database connection strings │ │
│ │ - Redis connection strings │ │
│ │ - Application Insights keys │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Technology Stack:
- React 18 with TypeScript
- Vite for build tooling
- TanStack Query for data fetching and caching
- Tailwind CSS for styling
- Axios for HTTP requests
Responsibilities:
- Render todo list UI
- Handle user interactions (create, update, delete, toggle)
- Filter todos by status and priority
- Display real-time updates
Key Features:
- Responsive design with dark theme
- Optimistic UI updates
- Toast notifications for user feedback
- Priority color coding (High=red, Medium=yellow, Low=green)
Pages and Routes:
/dashboard— Stats cards, recent tasks, priority breakdown, quick links/todos— Task list with CRUD via modal, search, status/priority filters/projects— Project cards with status badges, create project modal/projects/:id— Project details with tasks, team members, priority breakdown/users— Team member list with search and role filter/users/:id— User profile with assigned tasks, performance metrics, account info
Technology Stack:
- Node.js 20 LTS
- Express 4.18
- TypeScript 5.3
- Prisma ORM 5.7
- Winston for logging
- Application Insights SDK
Responsibilities:
- RESTful API for todo operations
- Health check endpoints
- Request/response logging
- Error handling and monitoring
- Rate limiting
API Endpoints:
Todo Management:
GET /api/todos- List all todosPOST /api/todos- Create new todoGET /api/todos/:id- Get single todoPATCH /api/todos/:id- Update todoDELETE /api/todos/:id- Delete todoPATCH /api/todos/:id/toggle- Toggle completion status
Health Checks:
GET /api/health- Basic health checkGET /api/health/detailed- Detailed health with dependenciesGET /api/health/memory- Memory usage statisticsGET /api/health/cpu- CPU usage informationGET /api/health/ready- Kubernetes readiness probeGET /api/health/live- Kubernetes liveness probe
Configuration:
- PostgreSQL 16
- SKU: B_Standard_B1ms (Burstable, 1 vCore, 2GB RAM)
- Storage: 32 GB
- Backup retention: 7 days
Schema:
model Todo {
id String @id @default(uuid())
title String
description String?
completed Boolean @default(false)
priority Priority @default(MEDIUM)
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
tags Tag[]
metadata TodoMetadata?
}
model Tag {
id String @id @default(uuid())
name String @unique
color String
todoId String
todo Todo @relation(fields: [todoId], references: [id])
}
model TodoMetadata {
id String @id @default(uuid())
todoId String @unique
category String?
estimatedTime Int?
actualTime Int?
notes String?
todo Todo @relation(fields: [todoId], references: [id])
}Intentional Issues:
- Missing indexes on
titleanddescriptioncolumns (Scenario 3) - N+1 query patterns in API endpoints (Scenario 2)
Configuration:
- SKU: Basic C0 (250 MB)
- TLS 1.2 required
- Authentication enabled
Cache Strategy:
- Cache todo lists with 5-minute TTL
- Invalidate cache on create/update/delete operations
- Use Redis as session store
- Cache frequently accessed todos
Cache Keys:
todos:all- All todos listtodos:completed- Completed todostodos:pending- Pending todostodo:{id}- Individual todo
Intentional Issues:
- Cache invalidation bug in update endpoint (Scenario 8)
- Connection pool exhaustion scenario (Scenario 4)
Telemetry Collection:
- HTTP requests and responses
- Custom events for business operations
- Exception tracking with stack traces
- Performance metrics (CPU, memory, response time)
- Dependency tracking (database, Redis, external APIs)
Custom Metrics:
todos_created- Counter for new todostodos_completed- Counter for completed todoscache_hit_rate- Cache effectivenessapi_response_time- Response time distribution
Kusto Queries:
// High error rate detection
requests
| where timestamp > ago(5m)
| summarize
total = count(),
errors = countif(resultCode >= 500)
| extend error_rate = (errors * 100.0) / total
| where error_rate > 5
// Slow queries
dependencies
| where type == "SQL"
| where duration > 2000
| summarize count() by operation_Name, bin(timestamp, 5m)
// Memory usage trend
performanceCounters
| where name == "% Processor Time"
| summarize avg(value) by bin(timestamp, 1m)CPU Alert:
- Metric: CpuPercentage
- Threshold: > 80%
- Window: 5 minutes
- Frequency: 1 minute
- Severity: 2 (Warning)
Memory Alert:
- Metric: MemoryPercentage
- Threshold: > 85%
- Window: 5 minutes
- Frequency: 1 minute
- Severity: 2 (Warning)
HTTP Error Alert:
- Metric: Http5xx
- Threshold: > 10 per minute
- Window: 5 minutes
- Frequency: 1 minute
- Severity: 1 (Error)
Response Time Alert:
- Metric: ResponseTime
- Threshold: > 2 seconds
- Window: 5 minutes
- Frequency: 1 minute
- Severity: 2 (Warning)
Resources Managed:
- Resource Group
- PostgreSQL Flexible Server
- Azure Redis Cache
- App Service Plan
- Linux Web App (Backend)
- Static Web App (Frontend)
- Application Insights
- Log Analytics Workspace
- Key Vault
- Monitor Action Group
- Metric Alerts
- Diagnostic Settings
State Management:
- Remote state stored in Azure Storage (optional)
- State locking with blob lease
- Sensitive outputs marked appropriately
Drift Detection:
- Scheduled daily runs via GitHub Actions
- Detects manual changes in Azure Portal
- Creates GitHub issues for drift alerts
- Scenario 10 demonstration capability
- Checkout code
- Install dependencies
- Run linter (continue on error for demo)
- Build with Vite
- Deploy to Azure Static Web Apps
- Checkout code
- Install dependencies
- Generate Prisma Client
- Run linter (continue on error for demo)
- Build TypeScript
- Build Docker image
- Push to Azure Container Registry
- Deploy to App Service
- Run database migrations
- Health check verification
- Terraform format check
- Terraform init
- Terraform validate
- Terraform plan (post to PR)
- Manual approval for apply (production)
- Terraform apply
- Drift detection (scheduled)
The project includes a comprehensive end-to-end testing framework using Playwright with AI-powered test generation through GitHub Copilot custom agents.
┌─────────────────────────────────────────────────────────────┐
│ Copilot Custom Agents │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Explorer │ │ Planner │ │ Implementer │ │
│ │ (MCP Nav) │─▶│ (Test Plan) │─▶│ (Code Gen) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Playwright Test Runner │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ Page Object │ │ Test Specs │ │ Fixtures │ │
│ │ Models │ │ (*.spec.ts) │ │ (Mock Data) │ │
│ └───────────────┘ └───────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Vite Dev Server :5173) │
│ API Mocks via page.route() │
└─────────────────────────────────────────────────────────────┘
e2e/
playwright.config.ts # Test runner configuration
tsconfig.json # TypeScript configuration
package.json # Dependencies (@playwright/test)
pages/ # Page Object Models
layout.page.ts # Navigation, header, mobile menu
dashboard.page.ts # Stats cards, links, priority breakdown
todos.page.ts # Task list, filters, search, create modal
projects.page.ts # Project list, create modal
users.page.ts # User list, search, role filter
tests/ # Test specifications
navigation.spec.ts # Routing, menu, active states, redirects
dashboard.spec.ts # Stats rendering, card links
todos.spec.ts # CRUD, filtering, search, toggle
projects.spec.ts # CRUD, detail view, members
users.spec.ts # List, search, detail view
fixtures/ # Shared resources
mock-data.ts # Mock API responses for todos, projects, users
API Mocking Strategy:
- All E2E tests use
page.route()to intercept API calls - Mock responses are defined in shared fixture files
- Tests never depend on a running backend — only the Vite dev server
- Different mock data for testing states: empty, populated, error
Selector Strategy (priority order):
getByRole()— buttons, links, headings, comboboxesgetByText()— visible text contentgetByPlaceholder()— form input placeholdersgetByLabel()— labeled form fieldslocator('[data-testid="..."]')— fallback only
Multi-Viewport Testing:
- Desktop Chrome (default viewport)
- Mobile iPhone 13 (tests responsive layout, mobile menu)
The test generation process is automated through 4 specialized agents:
-
Explorer (
playwright-explorer.agent.md)- Uses Playwright MCP to navigate the live application
- Takes snapshots of each page
- Documents interactive elements with accessible selectors
- Outputs
.testagent/exploration.md
-
Planner (
playwright-planner.agent.md)- Reads exploration findings and source code
- Organizes tests into incremental phases
- Defines Page Objects, mock data, and test cases
- Outputs
.testagent/e2e-plan.md
-
Implementer (
playwright-implementer.agent.md)- Implements one phase at a time
- Creates Page Object Models in
e2e/pages/ - Writes test specs with API mocks in
e2e/tests/ - Runs tests and fixes failures (up to 3 retries)
-
Tester (Orchestrator) (
playwright-tester.agent.md)- Coordinates the full pipeline: Explorer → Planner → Implementer
- Ensures dev server is running
- Validates all tests pass before reporting
# From the frontend directory
npm run test:e2e # Run all tests (headless)
npm run test:e2e:ui # Interactive UI mode
npm run test:e2e:headed # Visible browser window
# From the e2e directory
cd e2e
npx playwright test # All tests
npx playwright test tests/todos.spec.ts # Specific spec
npx playwright test --project=mobile # Mobile viewport only
npx playwright test --list # List all tests
npx playwright show-report # View HTML report- PostgreSQL allows all IP addresses (0.0.0.0/0)
- CORS allows all origins (*)
- Error handler exposes sensitive data in dev mode
- Soft delete purge enabled on Key Vault
- No network isolation or private endpoints
Network Security:
- Use Virtual Network integration
- Deploy with private endpoints
- Implement Network Security Groups (NSGs)
- Enable Azure Firewall
Authentication & Authorization:
- Implement Azure AD authentication
- Use Managed Identity for service-to-service auth
- Rotate secrets regularly
- Use Azure RBAC for access control
Data Protection:
- Enable encryption at rest and in transit
- Implement data retention policies
- Use Azure Backup for disaster recovery
- Enable geo-replication for critical data
Monitoring & Compliance:
- Enable Azure Security Center
- Implement Azure Policy for governance
- Use Azure Sentinel for SIEM
- Regular security audits
- Single instance App Service (no auto-scale)
- Basic Redis Cache (250 MB, no clustering)
- Burstable PostgreSQL (1 vCore)
Horizontal Scaling:
- Auto-scale App Service based on CPU/Memory
- Use Azure Front Door for global distribution
- Implement read replicas for PostgreSQL
- Use Redis cluster mode for cache
Vertical Scaling:
- Upgrade App Service to Premium tier
- Increase PostgreSQL to General Purpose tier
- Upgrade Redis to Premium tier with persistence
Performance Optimization:
- Implement CDN for static assets
- Use connection pooling
- Implement query result caching
- Optimize database indexes
- Use compression for API responses
Backup Strategy:
- PostgreSQL automated backups (7-day retention)
- Point-in-time restore capability
- Infrastructure as Code for environment recreation
- Application Insights data retention (30 days)
Recovery Procedures:
- Restore database from backup
- Deploy infrastructure from Terraform
- Deploy application from latest Docker image
- Verify health checks
- Update DNS if needed
RPO/RTO Targets:
- Recovery Point Objective (RPO): 24 hours
- Recovery Time Objective (RTO): 4 hours
Current Monthly Estimate (Dev Environment):
- App Service Basic (B1): ~$13
- PostgreSQL Flexible Server (B1ms): ~$12
- Redis Cache (Basic C0): ~$17
- Static Web App (Free): $0
- Application Insights: ~$5
- Log Analytics: ~$2
- Total: ~$49/month
Cost Saving Recommendations:
- Use Azure Dev/Test pricing
- Shut down non-production environments after hours
- Use Azure Reservations for production
- Implement resource tagging for cost tracking
- Review Log Analytics retention policies
Key Metrics to Display:
-
Application Health
- Request rate (requests/min)
- Error rate (%)
- Average response time (ms)
- Active users
-
Infrastructure Health
- CPU usage (%)
- Memory usage (%)
- Database connections
- Redis cache hit rate
-
Business Metrics
- Todos created (count)
- Todos completed (count)
- Alert frequency