fix: add retry with exponential backoff for bot launch 409 Conflict#1241
Open
fix: add retry with exponential backoff for bot launch 409 Conflict#1241
Conversation
Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #1240
When the Telegram Bot API returns a 409 Conflict error during bot startup (e.g., due to restart overlap, stale connections, or network issues), the bot now retries with exponential backoff instead of immediately exiting. - Extract launch retry logic into telegram-bot-launcher.lib.mjs - Retry schedule: 1s, 2s, 4s, 8s, 16s, 32s, 64s, ... up to 10 minutes max - Non-retryable errors (401 Unauthorized) still cause immediate exit - AbortSignal support for clean cancellation during shutdown - 10% jitter on retry delays to prevent thundering herd Fixes #1240 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
41 tests covering: - isRetryableError: 401 (non-retryable) vs 409/429/5xx/network (retryable) - calculateRetryDelay: exponential backoff schedule, jitter, cap at max - formatDelay: human-readable delay formatting - launchBotWithRetry: success, retry on 409, abort via signal, onRetry callback Refs #1240 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive analysis including: - Timeline reconstruction from production logs - Root cause analysis (6 identified causes) - Telegraf source code analysis (polling.ts error classification) - Community research across 8+ bot libraries - 5 proposed solutions with code examples - References to relevant Telegraf GitHub issues Refs #1240 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
409: Conflict: terminated by other getUpdates request; make sure that only one bot instance is runningThese are available as globals in Node.js 15+ (project requires 18+). Needed for the launch retry abort signal. Refs #1240 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit a591902.
Contributor
Author
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bot.launch()when it fails with a 409 Conflict error (e.g., due to restart overlap, stale TCP connections, or network issues)telegram-bot-launcher.lib.mjswith pure, testable functionsRoot Cause
The Telegram Bot API allows only one active
getUpdatesconnection per bot token. When a second request arrives (e.g., during process restart overlap, Docker container restart, or network reconnection), the API returns a 409 Conflict error. Telegraf treats this as fatal and throws immediately. The bot's error handler then calledprocess.exit(1)with no retry logic, making the bot permanently unavailable until manually restarted.Even with a single bot instance, this can happen due to:
restart: unless-stoppedcreating container overlapbot.stop()callFull analysis:
docs/case-studies/issue-1240/README.mdChanges
src/telegram-bot-launcher.lib.mjssrc/telegram-bot.mjsdeleteWebhook().then(bot.launch()).catch(exit)withlaunchBotWithRetry(), addlaunchAbortControllerfor clean shutdowntests/test-telegram-bot-launcher.mjspackage.jsondocs/case-studies/issue-1240/README.mddocs/case-studies/issue-1240/telegraf-issues-research.mddocs/case-studies/issue-1240/community-research.mddocs/case-studies/issue-1240/error-log.txt.changeset/fix-bot-409-retry.mdHow the retry works
Each attempt:
deleteWebhook({ drop_pending_updates: true })to clear any stale webhookbot.launch()with configured optionsThe retry loop is interruptible via
AbortSignal— SIGINT/SIGTERM during retry wait cleanly stops the process.Test plan
node tests/test-telegram-bot-launcher.mjs)Fixes #1240
🤖 Generated with Claude Code