This repository defines the full infrastructure for the ARS (Automated Response Security) platform, focusing on a robust, observable, and secure microservices stack running on Kubernetes. All manifests, overlays, and scripts are located in the infrastructure/ directory.
- Kubernetes: The core orchestration platform, supporting scalable, resilient deployments.
- Observability Stack: Includes Prometheus (metrics), Loki (logs), and Grafana (dashboards).
- Security Monitoring: Falco for real-time security monitoring and threat detection.
- Supporting Services: MinIO (object storage), Redis (cache/message broker), and more.
- App Pod: The main application,
oceanhealing-dev, is instrumented for observability and security.
The infrastructure uses Kustomize for declarative, environment-specific deployments:
infrastructure/
├── base/ # Base manifests and common configurations
├── components/ # Modular service definitions
│ ├── nginx-exporter/ # Web server metrics collection
│ ├── promtail/ # Log aggregation and forwarding
│ └── ... # Other components
├── security/ # Security monitoring stack
│ ├── falco-daemonset.yaml
│ ├── falco-config.yaml
│ └── falco_rules.yaml
├── overlays/ # Environment-specific configurations
│ ├── dev/ # Development environment
│ └── prod/ # Production environment
└── monitoring/ # Observability stack
- The main application container (React/Node.js).
- Exposes health endpoints and metrics for scraping.
- All observability and security tools are designed to monitor this pod.
- Collects and stores time-series metrics from the app pod and cluster components.
- Scrapes
/metricsendpoints, tracks CPU, memory, HTTP requests, etc. - Essential for alerting, SLOs, and performance analysis.
- Aggregates logs from all pods (including oceanhealing-dev) via Promtail.
- Enables fast, label-based log queries and correlation with metrics in Grafana.
- Critical for debugging and root-cause analysis.
- Visualization layer for both metrics (Prometheus) and logs (Loki).
- Provides dashboards, alerts, and a unified observability experience.
- S3-compatible object storage, used for:
- Storing app uploads, backups, and artifacts.
- (Optionally) long-term metrics/logs storage with Thanos, Velero, or other tools.
- Highly available and cloud-native.
- In-memory cache and message broker.
- Used for session storage, queueing, and accelerating app performance.
- Real-time runtime security monitoring.
- Detects suspicious behavior, file access, and container anomalies.
- Sends alerts to the security stack or external SIEMs.
- Note: Requires kernel module access for full functionality (see deployment notes).
- Network Policies: Secure pod-to-pod communication and restrict traffic based on namespace, label, or port.
- RBAC: Fine-grained access control for all services and users.
- Secrets Management: Secure storage and automated generation of sensitive data.
- Ingress: Configurable ingress controller for secure, external access to services.
- Autoscaling: Horizontal Pod Autoscalers (HPA) for dynamic scaling based on resource usage.
- Backup & Restore: PersistentVolumeClaims (PVC) for data, with documented backup/restore procedures.
- Resource Validation: Admission controllers and resource validators to enforce resource limits and best practices.
- Security Scanning: Automated secret scanning with gitleaks and
.gitignorerules to prevent secret leakage. - Disaster Recovery: Step-by-step cluster restore checklist and best practices for rapid recovery.
- Automated analysis and response engine for security incidents.
- Integrates with TheHive to process observables and run analyzers.
- Scalable and API-driven, enabling automated enrichment and response workflows.
- Open-source Security Incident Response Platform (SIRP).
- Manages cases, alerts, and investigations.
- Integrates with Cortex for automated analysis and with Elasticsearch for fast search.
- Distributed search and analytics engine.
- Stores and indexes security events, observables, and case data for TheHive.
- Enables fast, full-text search and analytics across security data.
# Deploy base infrastructure
kubectl apply -k infrastructure/
# Deploy development environment
kubectl apply -k infrastructure/overlays/dev/
# Deploy production environment
kubectl apply -k infrastructure/overlays/prod/# Deploy and access services
./scripts/deploy-and-access.sh
# Port forwarding utilities
./scripts/port-forward.sh
./scripts/port-forward-all.sh- Development: Uses overlays for local development with relaxed security policies
- Production: Enforces strict security policies and resource limits
- Custom: Create new overlays for specific environments
- ✅ Configuration: Properly configured with container plugin and security rules
- ✅ Deployment: Kustomize-based deployment with proper resource management
- ❌ Kernel Module: Limited by containerized environment (Docker Desktop)
- Full Functionality: Requires VM environment (Multipass, Vagrant) or bare metal
- Containerized: Limited to container metadata monitoring only
- Production: Consider host-level Falco installation for complete security monitoring
- Use VM-based Kubernetes cluster for full Falco functionality
- Consider alternative security monitoring tools for containerized environments
- Implement host-level security monitoring for production deployments
- Metrics: App pod exposes metrics → Prometheus scrapes and stores → Grafana visualizes.
- Logs: App pod logs collected by Promtail → Loki stores and indexes → Grafana queries logs.
- Security: Falco monitors all pods and nodes for threats (when kernel access available).
- Storage: MinIO provides persistent object storage for app and infra needs.
- Create component directory in
infrastructure/components/ - Add kustomization.yaml for the component
- Include in base kustomization or create environment-specific overlays
- Use Kustomize patches for environment-specific settings
- Create new overlays for custom environments
- Maintain consistent labeling and resource management
- All manifests are designed for clarity, security, and extensibility
- Use Kustomize for declarative, environment-specific deployments
- Maintain proper resource limits and security policies
- Document environment requirements and limitations
If you have questions about the stack, observability, or how to extend the cluster, open an issue or contact the maintainers.