Skip to content

Minor bug fixes and changes to enable code to run#58

Open
pmilford wants to merge 25 commits intoHKUDS:mainfrom
pmilford:main
Open

Minor bug fixes and changes to enable code to run#58
pmilford wants to merge 25 commits intoHKUDS:mainfrom
pmilford:main

Conversation

@pmilford
Copy link
Copy Markdown

@pmilford pmilford commented Aug 4, 2025

Fixes to import path errors in several files.
Changes default to no internet proxy.
Change logic on wait for docker to start, it was always failing.
PLATFORM env variable was missing from constants.py

@pmilford
Copy link
Copy Markdown
Author

pmilford commented Aug 4, 2025

I needed these changes to make some progress, still not fully running, but getting closer!

google-labs-jules bot and others added 20 commits August 4, 2025 11:41
The application would previously crash with a 'port is already allocated'
error if the port specified in the .env file was in use.

This change introduces more intelligent port handling in the
`DockerEnv.init_container` method:

1.  **For existing containers:** The script now inspects the container to
    find the port it was originally created with and reuses that port,
    preserving the container reuse functionality.

2.  **For new containers:** If the default port is taken, the script
    now automatically searches for the next available port, preventing
    the application from crashing.

This makes the application more resilient to common port conflicts in a
local development environment.
Fix(docker): Make container port handling robust
This commit resolves a critical bug where the application would either
crash due to port conflicts or fail silently when trying to restart
an existing container.

The `init_container` method in `docker_env.py` has been rewritten with
the following robust logic:

1.  **For Existing Containers:**
    - The container is now inspected to find its pre-assigned host port.
    - Before starting, the script checks if this port is actually
      available.
    - If the port is busy, the script now raises a clear, actionable
      error, instructing you to free up the specific port, rather
      than failing silently.
    - The `docker start` command now includes error checking.

2.  **For New Containers:**
    - If the default port is in use, the script automatically finds the
      next available port.
    - The `docker run` command now includes error checking to ensure
      container creation is successful.

This change makes the application significantly more resilient and
provides clearer feedback to you, improving the overall development
experience.
Fix(docker): Implement robust port and container lifecycle handling
This commit resolves all identified bugs related to Docker container
creation and reuse. The application was previously prone to crashing
or failing silently due to port conflicts and mishandled edge cases
like zombie containers.

The `init_container` method in `docker_env.py` has been completely
overhauled to provide a fully robust lifecycle management:

1.  **Zombie Container Detection:** The script now detects containers that
    were created but never successfully started (i.e., have no port
    mapping). It automatically removes these zombie containers and
    proceeds to create a fresh one.

2.  **Valid Container Reuse:** For existing, valid containers, the script
    inspects them to find their assigned port. It then checks if that
    port is available on the host.
    - If the port is free, the container is started.
    - If the port is busy, the script now raises a clear, actionable
      error message.

3.  **Error Handling:** All calls to `docker` commands via `subprocess`
    now have proper error checking (`check=True` or `try/except`) to
    prevent silent failures and provide clear stack traces.

4.  **New Container Creation:** The logic to find a new available port
    when the default is busy is preserved for creating new containers.

This final version ensures the application starts reliably, handles
all container states gracefully, and provides clear user feedback,
dramatically improving the development experience.
Fix(docker): Implement final robust container lifecycle logic
This commit resolves all identified bugs related to Docker container
creation and reuse, including race conditions and zombie container
states.

The `init_container` method in `docker_env.py` has been completely
overhauled to provide a fully robust lifecycle management:

1.  **Zombie Container Detection:** The script now detects containers that
    were created but never successfully started (i.e., have no port
    mapping). It automatically removes these zombie containers and
    proceeds to create a fresh one.

2.  **Valid Container Reuse:** For existing, valid containers, the script
    inspects them to find their pre-assigned host port. If the port is
    busy, the script now raises a clear, actionable error.

3.  **Race-Condition-Free Port Allocation:** For new containers, the script
    now delegates port assignment to Docker by using `-p 8000`. It then
    inspects the container to discover the randomly assigned host port.
    This eliminates the race condition that caused previous failures.

4.  **Error Handling & State Management:** All calls to `docker` are now
    properly error-checked. The `self.communication_port` variable is
    reliably updated in all scenarios to ensure the rest of the
    application can connect to the container.

This final version ensures the application starts reliably, handles
all container states gracefully, and provides clear user feedback.
Fix(docker): Final robust container lifecycle and port allocation
This change adds a 600-second timeout to the `litellm.acompletion` calls in `research_agent/inno/core.py`. This prevents premature timeouts when using slow models, such as Qwen models, which can have high latency.
fix: Add timeout to litellm calls
This change introduces a custom wait strategy for the retry mechanism.
When a rate limit error is encountered, the retry delay will be longer
to avoid overwhelming the server. For other errors, a shorter delay
is used.
I used a longer retry delay for rate limit errors.
This change corrects the import statement for `wait_base` from the
`tenacity` library. `wait_base` is not in the top-level `tenacity`
package, but in the `tenacity.wait` submodule.
The `extract_json_from_output` function in `run_infer_plan.py` and `run_infer_idea.py` can fail with a `json.JSONDecodeError` if the input string is not valid JSON.

This change adds logging to record the malformed JSON string when a `JSONDecodeError` occurs. This will help to debug issues with malformed JSON responses from the LLM.
Add logging for JSON parsing errors in `extract_json_from_output`.
This change addresses a JSON parsing error that occurred during the paper survey process. The error was caused by me calling an incorrect tool, which resulted in an invalid JSON output.

The following changes were made:

- Updated my instructions in `survey_agent.py` to be more explicit about the correct tool to use.
- Improved the `extract_json_from_output` function in `run_infer_plan.py` to be more robust by adding support for JSON in markdown code blocks.
Fix JSON parsing error and improve agent prompts
@zhutoutoutousan
Copy link
Copy Markdown

MARCO!!!

@markus-flicke
Copy link
Copy Markdown

Thank you very much for your efforts. Running the first (prompt, reference) example from their Gradio UI, I get the following: Error occurred while running Researcher: Failed to create container. Docker error: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]

Is this what you meant with "still not fully running"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants