This document outlines the security model, threat analysis, and best practices for CI Insight.
CI Insight is designed with a defense-in-depth approach:
- Privacy-first: Redact secrets, minimize data retention
- Secure by default: Safe defaults, explicit opt-in for risky features
- Transparent: Open source, auditable code, clear logging
- Graceful degradation: Security failures are logged, not silent
- CI/CD Logs: May contain secrets, API keys, credentials
- Application Data: Failure patterns, repository names, commit SHAs
- System Access: API endpoints, webhook receivers
- External Attackers: Attempting to access data or disrupt service
- Malicious Insiders: Users with legitimate access abusing privileges
- Accidental Exposure: Misconfiguration leading to data leaks
| Vector | Threat | Mitigation |
|---|---|---|
| Spoofed Webhooks | Attacker sends fake CI events | Webhook signature validation |
| Secret Leakage | Secrets in logs exposed via API/UI | Log redaction before storage |
| Unauthorized Access | External access to internal API | CORS policy, optional authentication |
| SQL Injection | Malicious input in API queries | SQLAlchemy ORM, parameterized queries |
| XSS | Malicious content in failure messages | React auto-escaping, CSP headers |
| AI Provider Leak | Secrets sent to OpenAI | Redaction before AI processing |
| SSRF | Webhook callbacks to internal services | Validate webhook sources |
Purpose: Prevent secrets from being stored or exposed
Implementation (app/services/analysis/redactor.py):
- Regex-based pattern matching
- Runs before database storage
- Runs before sending to AI providers
Patterns Detected:
- API keys:
api_key=...,apikey=... - Tokens:
token=...,bearer ... - GitHub tokens:
ghp_...,gho_..., etc. - AWS keys:
AKIA... - Passwords:
password=...,passwd=... - Database URLs:
postgres://user:pass@... - Private keys:
-----BEGIN PRIVATE KEY----- - JWT tokens:
eyJ... - Email addresses:
user@example.com
Configuration:
ENABLE_LOG_REDACTION=true # Recommended: always enabled
STORE_RAW_LOGS=false # Recommended: never store raw logsLimitations:
- Not 100% foolproof (custom secret formats may slip through)
- Trade-off: Aggressive redaction may over-redact useful info
- Recommendation: Review and enhance patterns for your use case
Testing:
pytest app/tests/test_redactor.pyPurpose: Verify webhooks are from legitimate sources
Algorithm: HMAC-SHA256
Configuration:
GITHUB_WEBHOOK_SECRET=your-secret-here
ENABLE_WEBHOOK_VALIDATION=trueGitHub Setup:
- Repository → Settings → Webhooks → Add webhook
- Set Secret field
- CI Insight validates
X-Hub-Signature-256header
Implementation (app/core/security.py:verify_github_signature):
def verify_github_signature(payload_body: bytes, signature_header: str) -> bool:
mac = hmac.new(secret, msg=payload_body, digestmod=hashlib.sha256)
expected = mac.hexdigest()
return hmac.compare_digest(signature_header, f"sha256={expected}")Algorithm: Token-based
Configuration:
JENKINS_WEBHOOK_SECRET=your-token-here
ENABLE_WEBHOOK_VALIDATION=trueJenkins Setup:
- Configure → Notification Endpoint
- Add
?token=your-token-hereto URL or send in header - CI Insight validates token equality
Dev Mode:
ENABLE_WEBHOOK_VALIDATION=false # Disables validation for testingWarning: Only use dev mode in trusted environments!
Purpose: Restrict which origins can access the API
Default Configuration:
CORS_ORIGINS=http://localhost:3000,http://localhost,http://frontendProduction Setup:
CORS_ORIGINS=https://ci-insight.example.comImplementation (app/main.py):
app.add_middleware(
CORSMiddleware,
allow_origins=settings.cors_origins_list,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)Mitigation: SQLAlchemy ORM with parameterized queries
Safe ✅:
db.query(Failure).filter(Failure.category == category).all()Unsafe ❌ (not used in codebase):
db.execute(f"SELECT * FROM failures WHERE category = '{category}'")Testing: API integration tests verify query safety
Mitigation:
- React's automatic escaping
- No
dangerouslySetInnerHTMLused - Future: Add Content-Security-Policy headers
Example (React auto-escapes):
<div>{failure.error_message}</div> // ✅ Safe: auto-escapedRisk: Sending secrets to external AI services
Mitigations:
- Redaction First: Logs redacted before AI processing
- Provider Choice: User controls where data goes
none: No AI, no external dataopenai: Data sent to OpenAI (user choice)local: Fully offline, no external calls
Configuration:
AI_PROVIDER=local # Default: offline TF-IDF
OPENAI_API_KEY= # Only needed if AI_PROVIDER=openaiBest Practice: Use local or none for sensitive environments
Current State: Basic in-app limit (60 req/min)
Production Recommendation:
- Deploy behind reverse proxy (Nginx, Traefik)
- Implement rate limiting at proxy level
- Example Nginx config:
limit_req_zone $binary_remote_addr zone=webhook:10m rate=10r/s; location /api/ingest/ { limit_req zone=webhook burst=20; }
SQLite (default):
- File-based, no network exposure
- Suitable for single-instance deployments
- Stored in Docker volume (isolated)
PostgreSQL (recommended for production):
- Network-isolated (Docker network)
- Strong password required
- Use connection pooling
- Regular backups
Configuration:
# PostgreSQL
DATABASE_URL=postgresql://ci_insight:strong_password@db:5432/ci_insight
# Ensure DB credentials are in .env, not committed- Set strong
GITHUB_WEBHOOK_SECRETandJENKINS_WEBHOOK_SECRET - Enable webhook validation:
ENABLE_WEBHOOK_VALIDATION=true - Configure CORS to specific origins (not
*) - Use PostgreSQL instead of SQLite
- Enable log redaction:
ENABLE_LOG_REDACTION=true - Disable raw log storage:
STORE_RAW_LOGS=false - Review AI provider choice:
AI_PROVIDER=localfor sensitive data - Set
DEBUG=false - Use environment variables (not hardcoded secrets)
- Run behind HTTPS (reverse proxy with SSL)
- Deploy behind reverse proxy (Nginx/Traefik/Cloudflare)
- Implement rate limiting at proxy level
- Set up firewall rules (only expose ports 80/443)
- Use private Docker network
- Enable container health checks
- Set up log aggregation (ELK, Splunk, etc.)
- Configure alerting for errors
- Regular database backups
- Patch OS and dependencies regularly
- Monitor failed authentication attempts
- Alert on unusual webhook volumes
- Track API error rates
- Monitor disk space (logs, database)
- Set up uptime monitoring
- Review logs for security events
- Immediate: Rotate compromised credentials
- Assess: Check logs for unauthorized access
- Remediate: Fix redaction patterns if needed
- Review: Audit all stored data for other secrets
- Test: Verify new patterns catch the leaked format
- Block: Update firewall/CORS rules
- Audit: Review all API calls from suspicious IPs
- Rotate: Change webhook secrets
- Investigate: Check for data exfiltration
- Patch: Fix vulnerability if found
- Disconnect: Take database offline
- Assess: Determine scope of breach
- Restore: From backup if corrupted
- Harden: Review and fix access controls
- Notify: Stakeholders if PII/sensitive data affected
For Production Users:
- Report security issues to: security@example.com
- Provide: Description, reproduction steps, impact
- We'll respond within 48 hours
- Responsible disclosure appreciated
For Developers:
- Review OWASP Top 10: https://owasp.org/www-project-top-ten/
- Run security scanner:
bandit -r app/ - Keep dependencies updated:
pip-audit
- Log redaction helps minimize PII
- Provide data export/deletion endpoints
- Document data retention policy
- Add consent management if needed
- Structured logging for audit trails
- Access controls (add authentication)
- Encryption at rest and in transit
- Regular security reviews
Remember: Security is a process, not a product. Regularly review and update these controls as threats evolve.