Skip to content

Longer retry delay#60

Open
pmilford wants to merge 19 commits intoHKUDS:mainfrom
pmilford:longer-retry-delay
Open

Longer retry delay#60
pmilford wants to merge 19 commits intoHKUDS:mainfrom
pmilford:longer-retry-delay

Conversation

@pmilford
Copy link
Copy Markdown

@pmilford pmilford commented Aug 6, 2025

AI oops

pmilford and others added 19 commits August 3, 2025 16:01
The application would previously crash with a 'port is already allocated'
error if the port specified in the .env file was in use.

This change introduces more intelligent port handling in the
`DockerEnv.init_container` method:

1.  **For existing containers:** The script now inspects the container to
    find the port it was originally created with and reuses that port,
    preserving the container reuse functionality.

2.  **For new containers:** If the default port is taken, the script
    now automatically searches for the next available port, preventing
    the application from crashing.

This makes the application more resilient to common port conflicts in a
local development environment.
Fix(docker): Make container port handling robust
This commit resolves a critical bug where the application would either
crash due to port conflicts or fail silently when trying to restart
an existing container.

The `init_container` method in `docker_env.py` has been rewritten with
the following robust logic:

1.  **For Existing Containers:**
    - The container is now inspected to find its pre-assigned host port.
    - Before starting, the script checks if this port is actually
      available.
    - If the port is busy, the script now raises a clear, actionable
      error, instructing you to free up the specific port, rather
      than failing silently.
    - The `docker start` command now includes error checking.

2.  **For New Containers:**
    - If the default port is in use, the script automatically finds the
      next available port.
    - The `docker run` command now includes error checking to ensure
      container creation is successful.

This change makes the application significantly more resilient and
provides clearer feedback to you, improving the overall development
experience.
Fix(docker): Implement robust port and container lifecycle handling
This commit resolves all identified bugs related to Docker container
creation and reuse. The application was previously prone to crashing
or failing silently due to port conflicts and mishandled edge cases
like zombie containers.

The `init_container` method in `docker_env.py` has been completely
overhauled to provide a fully robust lifecycle management:

1.  **Zombie Container Detection:** The script now detects containers that
    were created but never successfully started (i.e., have no port
    mapping). It automatically removes these zombie containers and
    proceeds to create a fresh one.

2.  **Valid Container Reuse:** For existing, valid containers, the script
    inspects them to find their assigned port. It then checks if that
    port is available on the host.
    - If the port is free, the container is started.
    - If the port is busy, the script now raises a clear, actionable
      error message.

3.  **Error Handling:** All calls to `docker` commands via `subprocess`
    now have proper error checking (`check=True` or `try/except`) to
    prevent silent failures and provide clear stack traces.

4.  **New Container Creation:** The logic to find a new available port
    when the default is busy is preserved for creating new containers.

This final version ensures the application starts reliably, handles
all container states gracefully, and provides clear user feedback,
dramatically improving the development experience.
Fix(docker): Implement final robust container lifecycle logic
This commit resolves all identified bugs related to Docker container
creation and reuse, including race conditions and zombie container
states.

The `init_container` method in `docker_env.py` has been completely
overhauled to provide a fully robust lifecycle management:

1.  **Zombie Container Detection:** The script now detects containers that
    were created but never successfully started (i.e., have no port
    mapping). It automatically removes these zombie containers and
    proceeds to create a fresh one.

2.  **Valid Container Reuse:** For existing, valid containers, the script
    inspects them to find their pre-assigned host port. If the port is
    busy, the script now raises a clear, actionable error.

3.  **Race-Condition-Free Port Allocation:** For new containers, the script
    now delegates port assignment to Docker by using `-p 8000`. It then
    inspects the container to discover the randomly assigned host port.
    This eliminates the race condition that caused previous failures.

4.  **Error Handling & State Management:** All calls to `docker` are now
    properly error-checked. The `self.communication_port` variable is
    reliably updated in all scenarios to ensure the rest of the
    application can connect to the container.

This final version ensures the application starts reliably, handles
all container states gracefully, and provides clear user feedback.
Fix(docker): Final robust container lifecycle and port allocation
This change adds a 600-second timeout to the `litellm.acompletion` calls in `research_agent/inno/core.py`. This prevents premature timeouts when using slow models, such as Qwen models, which can have high latency.
fix: Add timeout to litellm calls
This change introduces a custom wait strategy for the retry mechanism.
When a rate limit error is encountered, the retry delay will be longer
to avoid overwhelming the server. For other errors, a shorter delay
is used.
This change corrects the import statement for `wait_base` from the
`tenacity` library. `wait_base` is not in the top-level `tenacity`
package, but in the `tenacity.wait` submodule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant