Longer retry delay by pmilford · Pull Request #60 · HKUDS/AI-Researcher

pmilford · 2025-08-06T12:10:18Z

AI oops

…Added, with default to linux/amd64

The application would previously crash with a 'port is already allocated' error if the port specified in the .env file was in use. This change introduces more intelligent port handling in the `DockerEnv.init_container` method: 1. **For existing containers:** The script now inspects the container to find the port it was originally created with and reuses that port, preserving the container reuse functionality. 2. **For new containers:** If the default port is taken, the script now automatically searches for the next available port, preventing the application from crashing. This makes the application more resilient to common port conflicts in a local development environment.

Fix(docker): Make container port handling robust

This commit resolves a critical bug where the application would either crash due to port conflicts or fail silently when trying to restart an existing container. The `init_container` method in `docker_env.py` has been rewritten with the following robust logic: 1. **For Existing Containers:** - The container is now inspected to find its pre-assigned host port. - Before starting, the script checks if this port is actually available. - If the port is busy, the script now raises a clear, actionable error, instructing you to free up the specific port, rather than failing silently. - The `docker start` command now includes error checking. 2. **For New Containers:** - If the default port is in use, the script automatically finds the next available port. - The `docker run` command now includes error checking to ensure container creation is successful. This change makes the application significantly more resilient and provides clearer feedback to you, improving the overall development experience.

Fix(docker): Implement robust port and container lifecycle handling

This commit resolves all identified bugs related to Docker container creation and reuse. The application was previously prone to crashing or failing silently due to port conflicts and mishandled edge cases like zombie containers. The `init_container` method in `docker_env.py` has been completely overhauled to provide a fully robust lifecycle management: 1. **Zombie Container Detection:** The script now detects containers that were created but never successfully started (i.e., have no port mapping). It automatically removes these zombie containers and proceeds to create a fresh one. 2. **Valid Container Reuse:** For existing, valid containers, the script inspects them to find their assigned port. It then checks if that port is available on the host. - If the port is free, the container is started. - If the port is busy, the script now raises a clear, actionable error message. 3. **Error Handling:** All calls to `docker` commands via `subprocess` now have proper error checking (`check=True` or `try/except`) to prevent silent failures and provide clear stack traces. 4. **New Container Creation:** The logic to find a new available port when the default is busy is preserved for creating new containers. This final version ensures the application starts reliably, handles all container states gracefully, and provides clear user feedback, dramatically improving the development experience.

Fix(docker): Implement final robust container lifecycle logic

This commit resolves all identified bugs related to Docker container creation and reuse, including race conditions and zombie container states. The `init_container` method in `docker_env.py` has been completely overhauled to provide a fully robust lifecycle management: 1. **Zombie Container Detection:** The script now detects containers that were created but never successfully started (i.e., have no port mapping). It automatically removes these zombie containers and proceeds to create a fresh one. 2. **Valid Container Reuse:** For existing, valid containers, the script inspects them to find their pre-assigned host port. If the port is busy, the script now raises a clear, actionable error. 3. **Race-Condition-Free Port Allocation:** For new containers, the script now delegates port assignment to Docker by using `-p 8000`. It then inspects the container to discover the randomly assigned host port. This eliminates the race condition that caused previous failures. 4. **Error Handling & State Management:** All calls to `docker` are now properly error-checked. The `self.communication_port` variable is reliably updated in all scenarios to ensure the rest of the application can connect to the container. This final version ensures the application starts reliably, handles all container states gracefully, and provides clear user feedback.

Fix(docker): Final robust container lifecycle and port allocation

…the : with _ for docker names etc.

This change adds a 600-second timeout to the `litellm.acompletion` calls in `research_agent/inno/core.py`. This prevents premature timeouts when using slow models, such as Qwen models, which can have high latency.

fix: Add timeout to litellm calls

This change introduces a custom wait strategy for the retry mechanism. When a rate limit error is encountered, the retry delay will be longer to avoid overwhelming the server. For other errors, a shorter delay is used.

This change corrects the import statement for `wait_base` from the `tenacity` library. `wait_base` is not in the top-level `tenacity` package, but in the `tenacity.wait` submodule.

pmilford and others added 19 commits August 3, 2025 16:01

PLATFORM env varialbe appears to have been missing from constant.py. …

da1978a

…Added, with default to linux/amd64

Make default no proxy for web accesses

784267c

Correct the import for terminal_tools

4c225cc

correct planning_tools import

59d11db

Changed logic on wait_for_container_ready, it was always failing.

2eeb821

Merge pull request #1 from pmilford/fix/docker-port-allocation

774bcc5

Fix(docker): Make container port handling robust

Merge pull request #2 from pmilford/fix/docker-port-allocation

1c61584

Fix(docker): Implement robust port and container lifecycle handling

Merge pull request #3 from pmilford/fix/docker-port-allocation

3f63e8f

Fix(docker): Implement final robust container lifecycle logic

Merge pull request #4 from pmilford/fix/docker-port-allocation

fb7597a

Fix(docker): Final robust container lifecycle and port allocation

Fixes to code to permit : in model names, such as xxx:free, replaces …

59f1716

…the : with _ for docker names etc.

Merge branch 'main' of https://github.com/pmilford/AI-Researcher

546d140

fix: Add timeout to litellm calls

37c38c6

This change adds a 600-second timeout to the `litellm.acompletion` calls in `research_agent/inno/core.py`. This prevents premature timeouts when using slow models, such as Qwen models, which can have high latency.

Merge pull request #5 from pmilford/fix-timeout-issue

ea7d4a6

fix: Add timeout to litellm calls

I used a longer retry delay for rate limit errors.

b8b5c65

This change introduces a custom wait strategy for the retry mechanism. When a rate limit error is encountered, the retry delay will be longer to avoid overwhelming the server. For other errors, a shorter delay is used.

Fix ImportError for wait_base

0ebab11

This change corrects the import statement for `wait_base` from the `tenacity` library. `wait_base` is not in the top-level `tenacity` package, but in the `tenacity.wait` submodule.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longer retry delay#60

Longer retry delay#60
pmilford wants to merge 19 commits intoHKUDS:mainfrom
pmilford:longer-retry-delay

pmilford commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pmilford commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant