Cluster recovery procedures, incident reports, quorum management, and high-availability configurations.
Cluster recovery and restoration documentation:
- Complete cluster recovery guides
- Final status reports after incidents
- Recovery validation procedures
Detailed incident analysis and lessons learned:
- Cluster data loss post-mortem
- Quorum issue investigations
- Knet failure analysis
- Root cause analysis reports
Cluster Recovery
- Quorum restoration procedures
- Node recovery processes
- Data recovery strategies
- Cluster validation steps
High Availability
- Quorum configuration
- Corosync/Knet troubleshooting
- Cluster communication issues
- Split-brain prevention
docs/
βββ recovery/ # Recovery procedures and status reports
βββ incidents/ # Post-mortems and incident analysis
- Disaster recovery reference
- Incident response procedures
- Lessons learned documentation
- Cluster health maintenance
This repository contains critical disaster recovery procedures. Review regularly and ensure all team members are familiar with recovery processes.
Last Updated: January 23, 2026
Cluster: thelightville (3 nodes: pve, pve2, pve3)