Minor bug fixes and changes to enable code to run by pmilford · Pull Request #58 · HKUDS/AI-Researcher

pmilford · 2025-08-04T10:19:21Z

Fixes to import path errors in several files.
Changes default to no internet proxy.
Change logic on wait for docker to start, it was always failing.
PLATFORM env variable was missing from constants.py

…Added, with default to linux/amd64

pmilford · 2025-08-04T10:20:27Z

I needed these changes to make some progress, still not fully running, but getting closer!

The application would previously crash with a 'port is already allocated' error if the port specified in the .env file was in use. This change introduces more intelligent port handling in the `DockerEnv.init_container` method: 1. **For existing containers:** The script now inspects the container to find the port it was originally created with and reuses that port, preserving the container reuse functionality. 2. **For new containers:** If the default port is taken, the script now automatically searches for the next available port, preventing the application from crashing. This makes the application more resilient to common port conflicts in a local development environment.

Fix(docker): Make container port handling robust

This commit resolves a critical bug where the application would either crash due to port conflicts or fail silently when trying to restart an existing container. The `init_container` method in `docker_env.py` has been rewritten with the following robust logic: 1. **For Existing Containers:** - The container is now inspected to find its pre-assigned host port. - Before starting, the script checks if this port is actually available. - If the port is busy, the script now raises a clear, actionable error, instructing you to free up the specific port, rather than failing silently. - The `docker start` command now includes error checking. 2. **For New Containers:** - If the default port is in use, the script automatically finds the next available port. - The `docker run` command now includes error checking to ensure container creation is successful. This change makes the application significantly more resilient and provides clearer feedback to you, improving the overall development experience.

Fix(docker): Implement robust port and container lifecycle handling

This commit resolves all identified bugs related to Docker container creation and reuse. The application was previously prone to crashing or failing silently due to port conflicts and mishandled edge cases like zombie containers. The `init_container` method in `docker_env.py` has been completely overhauled to provide a fully robust lifecycle management: 1. **Zombie Container Detection:** The script now detects containers that were created but never successfully started (i.e., have no port mapping). It automatically removes these zombie containers and proceeds to create a fresh one. 2. **Valid Container Reuse:** For existing, valid containers, the script inspects them to find their assigned port. It then checks if that port is available on the host. - If the port is free, the container is started. - If the port is busy, the script now raises a clear, actionable error message. 3. **Error Handling:** All calls to `docker` commands via `subprocess` now have proper error checking (`check=True` or `try/except`) to prevent silent failures and provide clear stack traces. 4. **New Container Creation:** The logic to find a new available port when the default is busy is preserved for creating new containers. This final version ensures the application starts reliably, handles all container states gracefully, and provides clear user feedback, dramatically improving the development experience.

Fix(docker): Implement final robust container lifecycle logic

This commit resolves all identified bugs related to Docker container creation and reuse, including race conditions and zombie container states. The `init_container` method in `docker_env.py` has been completely overhauled to provide a fully robust lifecycle management: 1. **Zombie Container Detection:** The script now detects containers that were created but never successfully started (i.e., have no port mapping). It automatically removes these zombie containers and proceeds to create a fresh one. 2. **Valid Container Reuse:** For existing, valid containers, the script inspects them to find their pre-assigned host port. If the port is busy, the script now raises a clear, actionable error. 3. **Race-Condition-Free Port Allocation:** For new containers, the script now delegates port assignment to Docker by using `-p 8000`. It then inspects the container to discover the randomly assigned host port. This eliminates the race condition that caused previous failures. 4. **Error Handling & State Management:** All calls to `docker` are now properly error-checked. The `self.communication_port` variable is reliably updated in all scenarios to ensure the rest of the application can connect to the container. This final version ensures the application starts reliably, handles all container states gracefully, and provides clear user feedback.

Fix(docker): Final robust container lifecycle and port allocation

…the : with _ for docker names etc.

This change adds a 600-second timeout to the `litellm.acompletion` calls in `research_agent/inno/core.py`. This prevents premature timeouts when using slow models, such as Qwen models, which can have high latency.

fix: Add timeout to litellm calls

This change introduces a custom wait strategy for the retry mechanism. When a rate limit error is encountered, the retry delay will be longer to avoid overwhelming the server. For other errors, a shorter delay is used.

I used a longer retry delay for rate limit errors.

This change corrects the import statement for `wait_base` from the `tenacity` library. `wait_base` is not in the top-level `tenacity` package, but in the `tenacity.wait` submodule.

Fix ImportError for wait_base

The `extract_json_from_output` function in `run_infer_plan.py` and `run_infer_idea.py` can fail with a `json.JSONDecodeError` if the input string is not valid JSON. This change adds logging to record the malformed JSON string when a `JSONDecodeError` occurs. This will help to debug issues with malformed JSON responses from the LLM.

Add logging for JSON parsing errors in `extract_json_from_output`.

This change addresses a JSON parsing error that occurred during the paper survey process. The error was caused by me calling an incorrect tool, which resulted in an invalid JSON output. The following changes were made: - Updated my instructions in `survey_agent.py` to be more explicit about the correct tool to use. - Improved the `extract_json_from_output` function in `run_infer_plan.py` to be more robust by adding support for JSON in markdown code blocks.

Fix JSON parsing error and improve agent prompts

zhutoutoutousan · 2025-08-17T19:58:31Z

MARCO!!!

markus-flicke · 2025-10-22T10:14:39Z

Thank you very much for your efforts. Running the first (prompt, reference) example from their Gradio UI, I get the following: Error occurred while running Researcher: Failed to create container. Docker error: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]

Is this what you meant with "still not fully running"?

pmilford added 5 commits August 3, 2025 16:01

PLATFORM env varialbe appears to have been missing from constant.py. …

da1978a

…Added, with default to linux/amd64

Make default no proxy for web accesses

784267c

Correct the import for terminal_tools

4c225cc

correct planning_tools import

59d11db

Changed logic on wait_for_container_ready, it was always failing.

2eeb821

google-labs-jules bot and others added 20 commits August 4, 2025 11:41

Merge pull request #1 from pmilford/fix/docker-port-allocation

774bcc5

Fix(docker): Make container port handling robust

Merge pull request #2 from pmilford/fix/docker-port-allocation

1c61584

Fix(docker): Implement robust port and container lifecycle handling

Merge pull request #3 from pmilford/fix/docker-port-allocation

3f63e8f

Fix(docker): Implement final robust container lifecycle logic

Merge pull request #4 from pmilford/fix/docker-port-allocation

fb7597a

Fix(docker): Final robust container lifecycle and port allocation

Fixes to code to permit : in model names, such as xxx:free, replaces …

59f1716

…the : with _ for docker names etc.

Merge branch 'main' of https://github.com/pmilford/AI-Researcher

546d140

fix: Add timeout to litellm calls

37c38c6

This change adds a 600-second timeout to the `litellm.acompletion` calls in `research_agent/inno/core.py`. This prevents premature timeouts when using slow models, such as Qwen models, which can have high latency.

Merge pull request #5 from pmilford/fix-timeout-issue

ea7d4a6

fix: Add timeout to litellm calls

I used a longer retry delay for rate limit errors.

b8b5c65

This change introduces a custom wait strategy for the retry mechanism. When a rate limit error is encountered, the retry delay will be longer to avoid overwhelming the server. For other errors, a shorter delay is used.

Merge pull request #6 from pmilford/longer-retry-delay

b459fa5

I used a longer retry delay for rate limit errors.

Fix ImportError for wait_base

0ebab11

This change corrects the import statement for `wait_base` from the `tenacity` library. `wait_base` is not in the top-level `tenacity` package, but in the `tenacity.wait` submodule.

Merge pull request #7 from pmilford/longer-retry-delay

fe20f63

Fix ImportError for wait_base

Merge pull request #8 from pmilford/feature/add-json-error-logging

945178e

Add logging for JSON parsing errors in `extract_json_from_output`.

Merge pull request #9 from pmilford/fix-json-parsing-error

146bce9

Fix JSON parsing error and improve agent prompts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor bug fixes and changes to enable code to run#58

Minor bug fixes and changes to enable code to run#58
pmilford wants to merge 25 commits intoHKUDS:mainfrom
pmilford:main

pmilford commented Aug 4, 2025

Uh oh!

pmilford commented Aug 4, 2025

Uh oh!

zhutoutoutousan commented Aug 17, 2025

Uh oh!

markus-flicke commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pmilford commented Aug 4, 2025

Uh oh!

pmilford commented Aug 4, 2025

Uh oh!

zhutoutoutousan commented Aug 17, 2025

Uh oh!

markus-flicke commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants