Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
232 changes: 232 additions & 0 deletions docs/case-studies/issue-173/CASE-STUDY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
# Case Study: Issue #173 - `--model kilo/glm-5-free` Hangs Forever

## Summary

When using `--model kilo/glm-5-free`, the agent hangs indefinitely during provider package installation. The process gets stuck at the `bun add @openrouter/ai-sdk-provider@latest` command.

## Timeline of Events

### Sequence of Events (from verbose logs)

1. **T+0ms**: Agent started with `--model kilo/glm-5-free --verbose`
2. **T+32ms**: Model parsed: `providerID: "kilo"`, `modelID: "glm-5-free"`
3. **T+113ms**: Provider state initialization started
4. **T+136ms**: Provider SDK requested for `kilo` provider
5. **T+137ms**: Package installation initiated: `@openrouter/ai-sdk-provider@latest`
6. **T+138ms**: `bun add` command spawned
7. **∞**: Process hangs indefinitely - no completion, no error

### The Hanging Command

```json
{
"type": "log",
"level": "info",
"timestamp": "2026-02-14T13:43:01.984Z",
"service": "bun",
"cmd": [
"/home/hive/.bun/bin/bun",
"add",
"--force",
"--exact",
"--cwd",
"/home/hive/.cache/link-assistant-agent",
"@openrouter/ai-sdk-provider@latest"
],
"cwd": "/home/hive/.cache/link-assistant-agent",
"message": "running"
}
```

## Root Cause Analysis

### Primary Issue: Missing Timeout in Bun.spawn

The `BunProc.run()` function in `js/src/bun/index.ts` uses `Bun.spawn()` without a `timeout` option:

```typescript
const result = Bun.spawn([which(), ...cmd], {
...options,
stdout: 'pipe',
stderr: 'pipe',
env: {
...process.env,
...options?.env,
BUN_BE_BUN: '1',
},
});
```

Without a timeout, if `bun add` encounters any of the known hanging issues, the process waits indefinitely.

### Known Bun Package Manager Hang Issues

Based on research, several Bun issues can cause `bun add`/`bun install` to hang:

1. **HTTP 304 Response Handling** ([Issue #5831](https://github.com/oven-sh/bun/issues/5831))
- Improper handling of HTTP 304 (Not Modified) responses
- IPv6 configuration issues causing connection hangs
- Fixes merged in PR #6192 and PR #15511

2. **Failed Dependency Fetch** ([Issue #26341](https://github.com/oven-sh/bun/issues/26341))
- When tarball download fails (e.g., 401 Unauthorized), `bun install` hangs
- Missing error callback in isolated install mode
- Fix merged in PR #26342

3. **Large Package Count** ([Issue #23607](https://github.com/oven-sh/bun/issues/23607))
- Security scanner causes hang with 790+ packages
- Hang occurs in scanner loading mechanism

4. **Containerized Linux Environments** ([Issue #25624](https://github.com/oven-sh/bun/issues/25624))
- `bun install` hangs at "Resolving dependencies"
- Issues with Bun's in-memory resolution algorithm

### Contributing Factors

1. **Network Conditions**: The user's environment may have intermittent network issues
2. **IPv6 Configuration**: IPv6 issues can cause Bun to hang on DNS resolution
3. **Cache State**: Corrupted or partial cache can trigger hangs
4. **Missing Timeout**: The `BunProc.run()` function has no timeout mechanism

## Proposed Solutions

### Solution 1: Add Timeout to BunProc.run (Recommended)

Add a timeout option to the `Bun.spawn()` call in `BunProc.run()`:

```typescript
export async function run(
cmd: string[],
options?: Bun.SpawnOptions.OptionsObject<any, any, any> & { timeout?: number }
) {
const timeout = options?.timeout ?? 120000; // 2 minutes default

log.info(() => ({
message: 'running',
cmd: [which(), ...cmd],
timeout,
...options,
}));

const result = Bun.spawn([which(), ...cmd], {
...options,
stdout: 'pipe',
stderr: 'pipe',
timeout, // Add timeout support
killSignal: 'SIGTERM', // Graceful termination
env: {
...process.env,
...options?.env,
BUN_BE_BUN: '1',
},
});
// ...
}
```

### Solution 2: Pre-bundle the @openrouter/ai-sdk-provider Package

Instead of dynamically installing the package at runtime, pre-install it as a dependency:

```json
// package.json
{
"dependencies": {
"@openrouter/ai-sdk-provider": "^2.2.3"
}
}
```

This is how KiloCode and Kilo repositories handle the provider package.

### Solution 3: Use AbortSignal for More Control

```typescript
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 120000);

const result = Bun.spawn([which(), ...cmd], {
signal: controller.signal,
// ...
});

const code = await result.exited;
clearTimeout(timeoutId);
```

### Solution 4: Add Retry with Exponential Backoff

If the package installation fails, retry with exponential backoff:

```typescript
const MAX_RETRIES = 3;
const BASE_DELAY = 1000;

for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
try {
await BunProc.run(args, { cwd, timeout: 60000 });
break; // Success
} catch (e) {
if (attempt === MAX_RETRIES) throw e;
await delay(BASE_DELAY * Math.pow(2, attempt - 1));
}
}
```

## Recommended Fix

Implement **Solution 1** with a reasonable timeout (60-120 seconds) for package installation. This prevents indefinite hangs while still allowing enough time for legitimate package installations.

Additionally, consider implementing **Solution 2** for commonly-used provider packages to avoid runtime installation altogether.

## References

### Related Issues

- [Bun Issue #5831: bun install hangs sporadically](https://github.com/oven-sh/bun/issues/5831)
- [Bun Issue #26341: Bun install hangs when failing to fetch](https://github.com/oven-sh/bun/issues/26341)
- [Bun Issue #23607: bun install hangs with security scanner](https://github.com/oven-sh/bun/issues/23607)
- [Bun Issue #25624: bun install hangs in containerized Linux](https://github.com/oven-sh/bun/issues/25624)

### Bun Documentation

- [Bun Spawn Documentation](https://bun.sh/docs/runtime/child-process)
- Timeout option: `timeout: number` (milliseconds)
- Kill signal: `killSignal: "SIGTERM" | "SIGKILL" | ...`

### KiloCode/Kilo Reference Implementation

The Kilo provider implementation uses:
- Pre-installed `@openrouter/ai-sdk-provider` package
- API endpoint: `https://api.kilo.ai/api/openrouter/`
- Custom headers: `X-KILOCODE-EDITORNAME`, `User-Agent`

## Workarounds

### For Users

1. **Pre-install the package manually**:
```bash
bun add @openrouter/ai-sdk-provider
```

2. **Clear Bun cache**:
```bash
bun pm cache rm
```

3. **Disable IPv6** (if applicable):
```bash
# Linux
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
```

4. **Use a different model** while the issue is being fixed:
```bash
echo "hi" | agent --model opencode/grok-code --verbose
```

## Files Affected

- `js/src/bun/index.ts` - Main fix location (add timeout)
- `js/src/provider/provider.ts` - Provider SDK loading
1 change: 1 addition & 0 deletions docs/case-studies/issue-173/issue-data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"author":{"id":"MDQ6VXNlcjE0MzE5MDQ=","is_bot":false,"login":"konard","name":"Konstantin Diachenko"},"body":"```\nhive@vmi2955137:~$ echo \"hi\" | agent --model kilo/glm-5-free --verbose\n{ \n \"type\": \"status\",\n \"mode\": \"stdin-stream\",\n \"message\": \"Agent CLI in continuous listening mode. Accepts JSON and plain text input.\",\n \"hint\": \"Press CTRL+C to exit. Use --help for options.\",\n \"acceptedFormats\": [\n \"JSON object with \\\"message\\\" field\",\n \"Plain text\"\n ],\n \"options\": {\n \"interactive\": true,\n \"autoMergeQueuedMessages\": true,\n \"alwaysAcceptStdin\": true,\n \"compactJson\": false\n }\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.845Z\",\n \"service\": \"default\",\n \"version\": \"0.13.0\",\n \"command\": \"/home/hive/.bun/bin/bun /home/hive/.bun/install/global/node_modules/@link-assistant/agent/src/index.js --model kilo/glm-5-free --verbose\",\n \"workingDirectory\": \"/home/hive\",\n \"scriptPath\": \"/home/hive/.bun/install/global/node_modules/@link-assistant/agent/src/index.js\",\n \"message\": \"Agent started (continuous mode)\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.846Z\",\n \"service\": \"default\",\n \"directory\": \"/home/hive\",\n \"message\": \"creating instance\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.846Z\",\n \"service\": \"project\",\n \"directory\": \"/home/hive\",\n \"message\": \"fromDirectory\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.878Z\",\n \"service\": \"default\",\n \"rawModel\": \"kilo/glm-5-free\",\n \"providerID\": \"kilo\",\n \"modelID\": \"glm-5-free\",\n \"message\": \"using explicit provider/model\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.901Z\",\n \"service\": \"server\",\n \"method\": \"POST\",\n \"path\": \"/session\",\n \"message\": \"request\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.901Z\",\n \"service\": \"server\",\n \"status\": \"started\",\n \"method\": \"POST\",\n \"path\": \"/session\",\n \"message\": \"request\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.905Z\",\n \"service\": \"session\",\n \"id\": \"ses_3a39c05eeffeC3971iD1mpvpy1\",\n \"version\": \"agent-cli-1.0.0\",\n \"projectID\": \"global\",\n \"directory\": \"/home/hive\",\n \"title\": \"New session - 2026-02-14T13:43:01.905Z\",\n \"time\": {\n \"created\": 1771076581905,\n \"updated\": 1771076581905\n },\n \"message\": \"created\"\n}\n\n{ \n \"type\": \"session.created\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.905Z\",\n \"service\": \"bus\",\n \"message\": \"publishing\"\n}\n\n{ \n \"type\": \"session.updated\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.905Z\",\n \"service\": \"bus\",\n \"message\": \"publishing\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.906Z\",\n \"service\": \"server\",\n \"status\": \"completed\",\n \"duration\": 5,\n \"method\": \"POST\",\n \"path\": \"/session\",\n \"message\": \"request\"\n}\n{ \n \"type\": \"*\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.909Z\",\n \"service\": \"bus\",\n \"message\": \"subscribing\"\n}\n\n{ \n \"type\": \"input\",\n \"timestamp\": \"2026-02-14T13:43:01.913Z\",\n \"raw\": \"hi\",\n \"parsed\": {\n \"message\": \"hi\"\n },\n \"format\": \"text\"\n}\n{ \n \"type\": \"*\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.914Z\",\n \"service\": \"bus\",\n \"message\": \"subscribing\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.915Z\",\n \"service\": \"server\",\n \"method\": \"POST\",\n \"path\": \"/session/ses_3a39c05eeffeC3971iD1mpvpy1/message\",\n \"message\": \"request\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.915Z\",\n \"service\": \"server\",\n \"status\": \"started\",\n \"method\": \"POST\",\n \"path\": \"/session/ses_3a39c05eeffeC3971iD1mpvpy1/message\",\n \"message\": \"request\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.921Z\",\n \"service\": \"server\",\n \"status\": \"completed\",\n \"duration\": 6,\n \"method\": \"POST\",\n \"path\": \"/session/ses_3a39c05eeffeC3971iD1mpvpy1/message\",\n \"message\": \"request\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.930Z\",\n \"service\": \"config\",\n \"path\": \"/home/hive/.config/link-assistant-agent/config.json\",\n \"message\": \"loading\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.930Z\",\n \"service\": \"config\",\n \"path\": \"/home/hive/.config/link-assistant-agent/opencode.json\",\n \"message\": \"loading\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.931Z\",\n \"service\": \"config\",\n \"path\": \"/home/hive/.config/link-assistant-agent/opencode.jsonc\",\n \"message\": \"loading\"\n}\n\n{ \n \"type\": \"message.updated\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.946Z\",\n \"service\": \"bus\",\n \"message\": \"publishing\"\n}\n\n{ \n \"type\": \"message.part.updated\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.949Z\",\n \"service\": \"bus\",\n \"message\": \"publishing\"\n}\n\n{ \n \"type\": \"session.updated\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.950Z\",\n \"service\": \"bus\",\n \"message\": \"publishing\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.953Z\",\n \"service\": \"session.prompt\",\n \"step\": 0,\n \"sessionID\": \"ses_3a39c05eeffeC3971iD1mpvpy1\",\n \"message\": \"loop\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.956Z\",\n \"service\": \"session.prompt\",\n \"hint\": \"Enable with --generate-title flag or AGENT_GENERATE_TITLE=true\",\n \"message\": \"title generation disabled\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.958Z\",\n \"service\": \"provider\",\n \"status\": \"started\",\n \"message\": \"state\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.958Z\",\n \"service\": \"models.dev\",\n \"file\": {},\n \"message\": \"refreshing\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.970Z\",\n \"service\": \"provider\",\n \"message\": \"init\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.981Z\",\n \"service\": \"claude-oauth\",\n \"subscriptionType\": \"max\",\n \"scopes\": [\n \"user:inference\",\n \"user:mcp_servers\",\n \"user:profile\",\n \"user:sessions:claude_code\"\n ],\n \"message\": \"loaded oauth credentials\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.981Z\",\n \"service\": \"provider\",\n \"source\": \"credentials file (max)\",\n \"message\": \"using claude oauth credentials\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.981Z\",\n \"service\": \"provider\",\n \"providerID\": \"opencode\",\n \"message\": \"found\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.981Z\",\n \"service\": \"provider\",\n \"providerID\": \"kilo\",\n \"message\": \"found\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.982Z\",\n \"service\": \"provider\",\n \"providerID\": \"claude-oauth\",\n \"message\": \"found\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.982Z\",\n \"service\": \"provider\",\n \"status\": \"completed\",\n \"duration\": 24,\n \"message\": \"state\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.982Z\",\n \"service\": \"provider\",\n \"providerID\": \"kilo\",\n \"modelID\": \"glm-5-free\",\n \"message\": \"getModel\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.982Z\",\n \"service\": \"provider\",\n \"status\": \"started\",\n \"providerID\": \"kilo\",\n \"message\": \"getSDK\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.983Z\",\n \"service\": \"provider\",\n \"providerID\": \"kilo\",\n \"pkg\": \"@openrouter/ai-sdk-provider\",\n \"version\": \"latest\",\n \"message\": \"installing provider package\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.984Z\",\n \"service\": \"bun\",\n \"pkg\": \"@openrouter/ai-sdk-provider\",\n \"version\": \"latest\",\n \"message\": \"installing package using Bun's default registry resolution\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.984Z\",\n \"service\": \"bun\",\n \"cmd\": [\n \"/home/hive/.bun/bin/bun\",\n \"add\",\n \"--force\",\n \"--exact\",\n \"--cwd\",\n \"/home/hive/.cache/link-assistant-agent\",\n \"@openrouter/ai-sdk-provider@latest\"\n ],\n \"cwd\": \"/home/hive/.cache/link-assistant-agent\",\n \"message\": \"running\"\n}\n\n^C{\n \"type\": \"status\",\n \"message\": \"Received SIGINT. Shutting down...\"\n}\n{ \n \"type\": \"*\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:09.013Z\",\n \"service\": \"bus\",\n \"message\": \"unsubscribing\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:09.014Z\",\n \"service\": \"default\",\n \"directory\": \"/home/hive\",\n \"message\": \"disposing instance\"\n}\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:09.014Z\",\n \"service\": \"state\",\n \"key\": \"/home/hive\",\n \"message\": \"waiting for state disposal to complete\"\n}\n\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:09.014Z\",\n \"service\": \"state\",\n \"key\": \"/home/hive\",\n \"message\": \"state disposal completed\"\n}\n\nhive@vmi2955137:~$ \n```\n\nIt stuck forever on:\n```\n{ \n \"type\": \"log\",\n \"level\": \"info\",\n \"timestamp\": \"2026-02-14T13:43:01.984Z\",\n \"service\": \"bun\",\n \"cmd\": [\n \"/home/hive/.bun/bin/bun\",\n \"add\",\n \"--force\",\n \"--exact\",\n \"--cwd\",\n \"/home/hive/.cache/link-assistant-agent\",\n \"@openrouter/ai-sdk-provider@latest\"\n ],\n \"cwd\": \"/home/hive/.cache/link-assistant-agent\",\n \"message\": \"running\"\n}\n```\n\nSo I had to CTRL+C it.\n\nDouble check how it is done correctly in https://github.com/Kilo-Org/kilocode or https://github.com/Kilo-Org/kilo\n\nPlease download all logs and data related about the issue to this repository, make sure we compile that data to `./docs/case-studies/issue-{id}` folder, and use it to do deep case study analysis (also make sure to search online for additional facts and data), in which we will reconstruct timeline/sequence of events, find root causes of the problem, and propose possible solutions (including known existing components/libraries, that solve similar problem or can help in solutions).\n\nIf issue related to any other repository/project, where we can report issues on GitHub, please do so. Each issue must contain reproducible examples, workarounds and suggestions for fix the issue in code.","comments":[],"createdAt":"2026-02-14T13:45:50Z","labels":[{"id":"LA_kwDOQYTy3M8AAAACQHoi-w","name":"bug","description":"Something isn't working","color":"d73a4a"}],"number":173,"state":"OPEN","title":"`--model kilo/glm-5-free` is still not working","updatedAt":"2026-02-14T13:47:01Z"}
17 changes: 17 additions & 0 deletions js/.changeset/fix-bun-timeout-hang.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
'@link-assistant/agent': patch
---

Fix indefinite hang when using Kilo provider by adding timeout to BunProc.run (#173)

- Add DEFAULT_TIMEOUT_MS (2 minutes) for subprocess commands
- Add INSTALL_TIMEOUT_MS (60 seconds) for package installation
- Create TimeoutError for better error handling and retry logic
- Add retry logic for timeout errors (up to 3 attempts)
- Add helpful error messages for timeout and recovery scenarios

This prevents indefinite hangs caused by known Bun package manager issues:

- HTTP 304 response handling (oven-sh/bun#5831)
- Failed dependency fetch (oven-sh/bun#26341)
- IPv6 configuration issues
Loading