Skip to content

Commit 2ce34e4

Browse files
A WalkerA Walker
authored andcommitted
docs: Add comprehensive documentation
- Add README.md with badges and quick start - Add CONTRIBUTING.md with guidelines - Add CHANGELOG.md with version history - Add TROUBLESHOOTING.md with common issues - Add API.md with script API reference - Add ARCHITECTURE.md with system design - Add LICENSE (MIT)
1 parent 6a4b672 commit 2ce34e4

File tree

7 files changed

+27
-0
lines changed

7 files changed

+27
-0
lines changed

API.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["API Documentation](#api-documentation)\n\nComplete API reference for model-fallback-skill scripts.\n\n## Table of Contents\n\n- [health-check.py](#health-checkpy)\n- [fallback-trigger.py](#fallback-triggerpy)\n\n---\n\n## health-check.py\n\nProactive health monitoring for model providers.\n\n### Usage\n\n```bash\npython3 scripts/health-check.py <command> [options]\n```\n\n### Commands\n\n#### `start`\n\nStart the health check monitor in the background.\n\n```bash\npython3 scripts/health-check.py start\n```\n\n**Options:**\n- `--interval SECONDS` - Check interval in seconds (default: 300)\n- `--log-file PATH` - Log file path (default: ~/.nanobot/logs/model-health.log)\n\n**Example:**\n```bash\npython3 scripts/health-check.py start --interval 600\n```\n\n#### `stop`\n\nStop the health check monitor.\n\n```bash\npython3 scripts/health-check.py stop\n```\n\n#### `status`\n\nShow the current status of the health check monitor.\n\n```bash\npython3 scripts/health-check.py status\n```\n\n**Output:**\n```\nHealth Check Status:\n- Running: Yes\n- PID: 12345\n- Started: 2025-02-12 18:00:00\n- Last Check: 2025-02-12 18:05:00\n- Current Model: minimax/m2.5\n- Fallback Models: 3 available\n```\n\n#### `check`\n\nRun a single health check immediately.\n\n```bash\npython3 scripts/health-check.py check\n```\n\n**Options:**\n- `--verbose` - Show detailed output\n\n**Output:**\n```\nHealth Check Results:\n- Model: minimax/m2.5\n- Status: HEALTHY\n- Response Time: 1.2s\n- Error Rate: 0%\n- Timeout Rate: 0%\n- Last Check: 2025-02-12 18:05:00\n```\n\n#### `test`\n\nTest health check with a mock model call.\n\n```bash\npython3 scripts/health-check.py test\n```\n\n#### `diagnose`\n\nRun diagnostics on the health check system.\n\n```bash\npython3 scripts/health-check.py diagnose\n```\n\n**Output:**\n```\nDiagnostics:\n- Python: 3.12.0\n- Config: Found\n- Current Model: minimax/m2.5\n- Fallback Models: 3\n- Log File: ~/.nanobot/logs/model-health.log\n- PID File: /tmp/health-check.pid\n- Status: Running\n```\n\n### Configuration Parameters\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `CHECK_INTERVAL` | 300 | Time between health checks (seconds) |\n| `MAX_RESPONSE_TIME` | 30 | Maximum acceptable response time (seconds) |\n| `MAX_TIMEOUT_RATE` | 0.2 | Maximum timeout rate (0.0-1.0) |\n| `MAX_ERROR_RATE` | 0.1 | Maximum error rate (0.0-1.0) |\n| `SAMPLE_SIZE` | 10 | Number of requests to sample for rate calculation |\n\n### Exit Codes\n\n| Code | Meaning |\n|------|---------|\n| 0 | Success |\n| 1 | General error |\n| 2 | Config not found |\n| 3 | No fallback models configured |\n| 4 | Health check already running |\n| 5 | Health check not running |\n\n---\n\n## fallback-trigger.py\n\nReactive fallback triggering and management.\n\n### Usage\n\n```bash\npython3 scripts/fallback-trigger.py <command> [options]\n```\n\n### Commands\n\n#### `trigger`\n\nTrigger a fallback to the next model.\n\n```bash\npython3 scripts/fallback-trigger.py trigger\n```\n\n**Options:**\n- `--reason TEXT` - Reason for fallback (default: \"Manual trigger\")\n- `--force` - Force fallback even if current model is healthy\n\n**Example:**\n```bash\npython3 scripts/fallback-trigger.py trigger --reason \"High error rate\" --force\n```\n\n#### `status`\n\nShow current fallback status.\n\n```bash\npython3 scripts/fallback-trigger.py status\n```\n\n**Output:**\n```\nFallback Status:\n- Current Model: minimax/m2.5\n- Fallback Chain:\n 1. openrouter/anthropic/claude-3.5-sonnet\n 2. openrouter/glm-4.7\n 3. openrouter/google/gemini-2.0-flash-exp:free\n- Last Fallback: Never\n- Total Fallbacks: 0\n```\n\n#### `list`\n\nList all configured fallback models.\n\n```bash\npython3 scripts/fallback-trigger.py list\n```\n\n**Output:**\n```\nFallback Models:\n1. openrouter/anthropic/claude-3.5-sonnet\n2. openrouter/glm-4.7\n3. openrouter/google/gemini-2.0-flash-exp:free\n```\n\n#### `test`\n\nTest fallback without actually switching.\n\n```bash\npython3 scripts/fallback-trigger.py test\n```\n\n**Output:**\n```\nFallback Test:\n- Current Model: minimax/m2.5\n- Next Model: openrouter/anthropic/claude-3.5-sonnet\n- Would Switch: Yes\n- Config Valid: Yes\n```\n\n#### `reset`\n\nReset fallback counter and logs.\n\n```bash\npython3 scripts/fallback-trigger.py reset\n```\n\n**Options:**\n- `--clear-logs` - Also clear fallback logs\n\n#### `history`\n\nShow fallback history.\n\n```bash\npython3 scripts/fallback-trigger.py history\n```\n\n**Options:**\n- `--limit N` - Show last N entries (default: 10)\n\n**Output:**\n```\nFallback History:\n2025-02-12 18:00:00 - minimax/m2.5 -> openrouter/claude-3.5-sonnet (High error rate)\n2025-02-12 17:30:00 - openrouter/glm-4.7 -> minimax/m2.5 (Manual test)\n```\n\n### Configuration Parameters\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `CONFIG_PATH` | ~/.nanobot/config.json | Path to nanobot config |\n| `LOG_PATH` | ~/.nanobot/logs/model-fallback.log | Path to fallback log |\n| `HISTORY_PATH` | ~/.nanobot/logs/fallback-history.json | Path to fallback history |\n| `TRIGGER_FILE` | /tmp/nanobot-restart | Path to restart trigger file |\n\n### Exit Codes\n\n| Code | Meaning |\n|------|---------|\n| 0 | Success |\n| 1 | General error |\n| 2 | Config not found |\n| 3 | No fallback models available |\n| 4 | No more fallback models in chain |\n| 5 | Config invalid |\n\n---\n\n## Examples\n\n### Start health check with custom interval\n\n```bash\npython3 scripts/health-check.py start --interval 600\n```\n\n### Trigger fallback with reason\n\n```bash\npython3 scripts/fallback-trigger.py trigger --reason \"API timeout exceeded\"\n```\n\n### Check health with verbose output\n\n```bash\npython3 scripts/health-check.py check --verbose\n```\n\n### View fallback history\n\n```bash\npython3 scripts/fallback-trigger.py history --limit 20\n```\n\n### Run diagnostics\n\n```bash\npython3 scripts/health-check.py diagnose\npython3 scripts/fallback-trigger.py status\n```]"]

ARCHITECTURE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["System Architecture](#system-architecture)\n\nDesign and architecture of the model-fallback-skill system.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [System Components](#system-components)\n- [Data Flow](#data-flow)\n- [Failure Detection](#failure-detection)\n- [Fallback Decision Tree](#fallback-decision-tree)\n- [File Structure](#file-structure)\n\n---\n\n## Overview\n\nThe model-fallback-skill provides two complementary mechanisms for ensuring nanobot reliability:\n\n1. **Proactive Health Monitoring** - Continuously monitors model health and switches before failures occur\n2. **Reactive Fallback** - Responds to actual failures and switches to backup models\n\nBoth mechanisms work together to provide zero-downtime operation.\n\n---\n\n## System Components\n\n```\n\u252c\u2500 nanobot-wrapper.sh\n\u2502 \u2514\u2500 nanobot gateway\n\u2502\n\u251c\u2500 health-check.py (daemon)\n\u2502 \u2514\u2500 Monitors model health every 5 minutes\n\u2502 \u2514\u2500 Tests response time, error rate, timeout rate\n\u2502 \u2514\u2500 Triggers fallback if thresholds exceeded\n\u2502\n\u251c\u2500 fallback-trigger.py (on-demand)\n\u2502 \u2514\u2500 Manually trigger fallback\n\u2502 \u2514\u2500 Check fallback status\n\u2502 \u2514\u2500 View fallback history\n\u2502\n\u251c\u2500 config.json\n\u2502 \u2514\u2500 Current model configuration\n\u2502 \u2514\u2500 Fallback chain\n\u2502 \u2514\u2500 Provider API keys\n\u2502\n\u2514\u2500 Logs/\n \u2514\u2500 model-health.log\n \u2514\u2500 model-fallback.log\n \u2514\u2500 fallback-history.json\n```\n\n### Component Descriptions\n\n#### nanobot-wrapper.sh\n- Wraps nanobot gateway in a restart loop\n- Monitors for restart trigger file\n- Automatically restarts nanobot when triggered\n\n#### health-check.py\n- Runs as a background daemon\n- Periodically tests model health\n- Maintains health metrics\n- Triggers fallback when thresholds exceeded\n\n#### fallback-trigger.py\n- Manages fallback operations\n- Updates configuration with new model\n- Creates restart trigger\n- Logs fallback events\n\n---\n\n## Data Flow\n\n### Health Check Flow\n\n```\n\u252c\u2500 Start (every 5 minutes)\n\u2502\n\u251c\u2500 Read config.json\n\u2502 \u2514\u2500 Get current model\n\u2502 \u2514\u2500 Get fallback chain\n\u2502\n\u251c\u2500 Test Model Health\n\u2502 \u251c\u2500 Send test request\n\u2502 \u251c\u2500 Measure response time\n\u2502 \u251c\u2500 Check for errors\n\u2502 \u2514\u2500 Check for timeouts\n\u2502\n\u251c\u2500 Calculate Metrics\n\u2502 \u251c\u2500 Average response time\n\u2502 \u251c\u2500 Error rate (last N requests)\n\u2502 \u251c\u2500 Timeout rate (last N requests)\n\u2502 \u2514\u2500 Overall health score\n\u2502\n\u251c\u2500 Evaluate Thresholds\n\u2502 \u251c\u2500 Response time < MAX_RESPONSE_TIME?\n\u2502 \u251c\u2500 Error rate < MAX_ERROR_RATE?\n\u2502 \u251c\u2500 Timeout rate < MAX_TIMEOUT_RATE?\n\u2502 \u2514\u2500 Overall score acceptable?\n\u2502\n\u251c\u2500 Decision\n\u2502 \u251c\u2500 HEALTHY \u2192 Log and continue\n\u2502 \u2514\u2500 UNHEALTHY \u2192 Trigger fallback\n\u2502 \u251c\u2500 Call fallback-trigger.py\n\u2502 \u251c\u2500 Update config\n\u2502 \u251c\u2500 Create restart trigger\n\u2502 \u2514\u2500 Log event\n\u2502\n\u2514\u2500 Wait for next interval\n```\n\n### Fallback Trigger Flow\n\n```\n\u252c\u2500 Trigger Request\n\u2502\n\u251c\u2500 Validate Configuration\n\u2502 \u251c\u2500 Config file exists?\n\u2502 \u251c\u2500 Current model configured?\n\u2502 \u251c\u2500 Fallback models available?\n\u2502 \u2514\u2500 Not at end of chain?\n\u2502\n\u251c\u2500 Select Next Model\n\u2502 \u251c\u2500 Get current model from config\n\u2502 \u251c\u2500 Find current in fallback chain\n\u2502 \u2514\u2500 Select next in chain\n\u2502\n\u251c\u2500 Update Configuration\n\u2502 \u251c\u2500 Read config.json\n\u2502 \u251c\u2500 Update default model\n\u2502 \u251c\u2500 Write config.json\n\u2502 \u2514\u2500 Verify changes\n\u2502\n\u251c\u2500 Trigger Restart\n\u2502 \u251c\u2500 Create /tmp/nanobot-restart\n\u2502 \u251c\u2500 Signal nanobot process\n\u2502 \u2514\u2500 Wrapper detects and restarts\n\u2502\n\u251c\u2500 Log Event\n\u2502 \u251c\u2500 Log to model-fallback.log\n\u2502 \u251c\u2500 Record in fallback-history.json\n\u2502 \u2514\u2500 Include reason and timestamp\n\u2502\n\u2514\u2500 Return Success\n```\n\n---\n\n## Failure Detection\n\n### Health Metrics\n\n| Metric | Description | Threshold | Weight |\n|--------|-------------|-----------|--------|\n| **Response Time** | Time to receive response | 30 seconds | 40% |\n| **Error Rate** | Percentage of failed requests | 10% | 35% |\n| **Timeout Rate** | Percentage of timed out requests | 20% | 25% |\n\n### Health Score Calculation\n\n```\nHealth Score = (Response Score \u00d7 0.4) +\n (Error Score \u00d7 0.35) +\n (Timeout Score \u00d7 0.25)\n\nWhere:\n- Response Score = 1 - (Response Time / Max Response Time)\n- Error Score = 1 - (Error Rate / Max Error Rate)\n- Timeout Score = 1 - (Timeout Rate / Max Timeout Rate)\n\nHealth Score < 0.7 = UNHEALTHY\n```\n\n### Sample Size\n\n- Last 10 requests are sampled for rate calculations\n- Rolling window updates with each check\n- Provides accurate recent performance\n\n---\n\n## Fallback Decision Tree\n\n```\n\u252c\u2500 Fallback Triggered\n\u2502\n\u251c\u2500 Check Fallback Chain\n\u2502 \u251c\u2500 Any models in chain?\n\u2502 \u2514\u2500 Yes \u2192 Continue\n\u2502 No \u2192 Log error, exit\n\u2502\n\u251c\u2500 Find Current Model Position\n\u2502 \u251c\u2500 Current model in chain?\n\u2502 \u2514\u2500 Yes \u2192 Get next model\n\u2502 No \u2192 Use first model in chain\n\u2502\n\u251c\u2500 Select Next Model\n\u2502 \u251c\u2500 Next model available?\n\u2502 \u2514\u2500 Yes \u2192 Switch to next\n\u2502 No \u2192 Log \"End of chain\", exit\n\u2502\n\u251c\u2500 Validate Next Model\n\u2502 \u251c\u2500 API key configured?\n\u2502 \u251c\u2500 Model valid?\n\u2502 \u2514\u2500 Yes \u2192 Proceed\n\u2502 No \u2192 Try next in chain\n\u2502\n\u251c\u2500 Update Config\n\u2502 \u251c\u2500 Set new default model\n\u2502 \u2514\u2500 Write to config.json\n\u2502\n\u251c\u2500 Trigger Restart\n\u2502 \u251c\u2500 Create restart trigger file\n\u2502 \u251c\u2500 Signal nanobot\n\u2502 \u2514\u2500 Wrapper handles restart\n\u2502\n\u251c\u2500 Log Event\n\u2502 \u251c\u2500 Old model \u2192 New model\n\u2502 \u251c\u2500 Timestamp\n\u2502 \u251c\u2500 Reason\n\u2502 \u2514\u2500 Health metrics\n\u2502\n\u2514\u2500 Return Success\n```\n\n---\n\n## File Structure\n\n```\nmodel-fallback-skill/\n\u251c\u2500 SKILL.md # Nanobot skill documentation\n\u251c\u2500 README.md # Main documentation\n\u251c\u2500 CONTRIBUTING.md # Contribution guidelines\n\u251c\u2500 CHANGELOG.md # Version history\n\u251c\u2500 TROUBLESHOOTING.md # Troubleshooting guide\n\u251c\u2500 API.md # API documentation\n\u251c\u2500 ARCHITECTURE.md # This file\n\u251c\u2500 LICENSE # MIT License\n\u251c\u2500 install.sh # Installation script\n\u251c\u2500 .git/ # Git repository\n\u251c\u2500 scripts/\n\u2502 \u251c\u2500 health-check.py # Health monitoring daemon\n\u2502 \u2514\u2500 fallback-trigger.py # Fallback management\n\u2514\u2500 examples/\n \u251c\u2500 README.md # Examples documentation\n \u251c\u2500 config-minimax-openrouter.json\n \u251c\u2500 config-openrouter-only.json\n \u2514\u2500 config-production.json\n```\n\n### External Files\n\n```\n~/.nanobot/\n\u251c\u2500 config.json # nanobot configuration\n\u251c\u2500 nanobot-wrapper.sh # nanobot wrapper\n\u2514\u2500 logs/\n \u251c\u2500 model-health.log # Health check logs\n \u251c\u2500 model-fallback.log # Fallback event logs\n \u2514\u2500 fallback-history.json # Fallback history\n\n/tmp/\n\u2514\u2500 nanobot-restart # Restart trigger file\n```"]

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["Changelog](#changelog)\nAll notable changes to this project will be documented in this file.\n\nThe format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),\nand this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).\n\n## [1.0.0] - 2025-02-12\n\n### Added\n- Initial release of model-fallback-skill\n- Automatic model switching on failures\n- Health monitoring with configurable thresholds\n- Multi-model fallback chain support\n- Comprehensive logging system\n- One-command installation script\n- Example configurations for common setups\n- Full documentation suite\n\n### Features\n- Proactive health monitoring (5-minute intervals)\n- Reactive fallback on failures\n- Support for Minimax, OpenRouter, and other providers\n- Configurable response time, error rate, and timeout rate thresholds\n- Status checking and reporting\n- Manual fallback triggering for testing\n\n### Documentation\n- README with quick start guide\n- API documentation for all scripts\n- Architecture documentation\n- Troubleshooting guide\n- Contributing guidelines\n- Example configurations\n\n## [Unreleased]\n\n### Planned\n- [ ] Web dashboard for health monitoring\n- [ ] Slack/Discord notifications for fallback events\n- [ ] Performance metrics dashboard\n- [ ] Automatic rollback on recovery\n- [ ] Support for custom health check endpoints\n\n---\n\n[1.0.0]: https://github.com/capt-marbles/model-fallback-skill/releases/tag/v1.0.0"]

CONTRIBUTING.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["Code of Conduct](#code-of-conduct)\n- [Getting Started](#getting-started)\n- [Development Workflow](#development-workflow)\n- [Coding Standards](#coding-standards)\n- [Testing](#testing)\n- [Submitting Changes](#submitting-changes)\n- [Reporting Issues](#reporting-issues)\n\n## Code of Conduct\n\nBe respectful, constructive, and inclusive. We're all here to improve the project.\n\n## Getting Started\n\n1. Fork the repository\n2. Clone your fork:\n ```bash\n git clone https://github.com/yourusername/model-fallback-skill.git\n cd model-fallback-skill\n ```\n3. Create a feature branch:\n ```bash\n git checkout -b feature/your-feature-name\n ```\n\n## Development Workflow\n\n1. Make your changes\n2. Test thoroughly\n3. Commit with clear messages\n4. Push to your fork\n5. Open a pull request\n\n## Coding Standards\n\n### Python\n\n- Follow PEP 8 style guide\n- Use type hints where appropriate\n- Include docstrings for functions and classes\n- Keep functions focused and small\n\n### Shell Scripts\n\n- Use POSIX-compliant shell syntax\n- Add error handling\n- Include comments for complex logic\n\n### Documentation\n\n- Use clear, concise language\n- Update relevant docs with changes\n- Include examples where helpful\n\n## Testing\n\nTest your changes:\n```bash\n# Test health check\npython3 scripts/health-check.py test\n\n# Test fallback trigger\npython3 scripts/fallback-trigger.py test\n\n# Verify installation\n./install.sh --check\n```\n\n## Submitting Changes\n\n1. Update documentation if needed\n2. Add tests for new features\n3. Update CHANGELOG.md\n4. Create a pull request with:\n - Clear title\n - Description of changes\n - Reference to related issues\n\n## Reporting Issues\n\nWhen reporting issues, include:\n- Python version\n- nanobot version\n- Steps to reproduce\n- Expected vs actual behavior\n- Relevant logs\n\n## License\n\nBy contributing, you agree that your contributions will be licensed under the MIT License."]

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 capt-marbles
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

0 commit comments

Comments
 (0)