Skip to content

fix: Resolve channel default_agent by name at message-time instead of caching UUID at startup#542

Open
psumo wants to merge 1 commit intoRightNow-AI:mainfrom
psumo:fix/dynamic-agent-resolution
Open

fix: Resolve channel default_agent by name at message-time instead of caching UUID at startup#542
psumo wants to merge 1 commit intoRightNow-AI:mainfrom
psumo:fix/dynamic-agent-resolution

Conversation

@psumo
Copy link

@psumo psumo commented Mar 12, 2026

Problem

When a channel's default_agent is resolved at startup, the UUID gets cached in the AgentRouter. Kill + respawn assigns a new UUID, but the router still holds the old one. Every message after that fails with Agent not found: <old-uuid> until you restart the whole process.

I kept hitting this while iterating on agent manifests:

  1. Edit manifest (tweak system prompt, change model, etc)
  2. agent kill foo + agent spawn foo
  3. Agent comes back with a new UUID
  4. Telegram messages all break — router is still pointing at the dead UUID
  5. Have to restart the entire container to fix it

Fix

Lazy re-resolution fallback in the bridge dispatch path. When send_message() gets an "Agent not found" error:

  1. Look up the channel's configured default agent name (not UUID)
  2. Call find_agent_by_name() to get the current UUID
  3. Update the router cache
  4. Retry the message

The cached UUID is still checked first so there's no overhead in the normal case. Re-resolution only kicks in when the send actually fails.

Changes

crates/openfang-channels/src/router.rs (+24 lines)

  • channel_default_names: DashMap<String, String> field — stores the configured agent name next to the cached UUID
  • set_channel_default_with_name() — stores both name and ID
  • channel_default_name() — getter for the stored name
  • update_channel_default() — swap out just the cached UUID after re-resolution

crates/openfang-channels/src/bridge.rs (+117 lines)

  • try_reresolution() — checks for "Agent not found", looks up name from router, calls find_agent_by_name(), updates cache, returns new ID
  • dispatch_message() error path — on "Agent not found", tries re-resolution + one retry
  • dispatch_with_blocks() — same logic for multimodal messages
  • blocksblocks.clone() so blocks can be reused on retry

crates/openfang-api/src/channel_bridge.rs (1 line)

  • set_channel_default()set_channel_default_with_name() at startup so the name is preserved

Backwards compat

  • set_channel_default() is untouched — only the startup path uses the new _with_name variant
  • No new trait methods — uses existing find_agent_by_name()
  • No config or API changes
  • New fields on AgentRouter are private
  • Existing tests pass as-is

Testing

  1. Configure a channel with default_agent = "foo"
  2. Send a message — works fine
  3. agent kill fooagent spawn foo
  4. Send another message
  5. Before: Agent error: Agent not found: ...
  6. After: message goes through, logs show re-resolution happened
  7. Subsequent messages hit the updated cache with no re-resolution

Edge cases:

  • Agent killed but not respawned → falls through to the original error
  • Non-"Agent not found" errors (model API failures etc) → no re-resolution triggered
  • Multiple channels with different defaults → each re-resolves independently
  • Multimodal messages → retry works with cloned blocks

… caching UUID at startup

When a channel's default_agent is resolved at startup, the UUID gets
cached in the AgentRouter. Kill + respawn assigns a new UUID but the
router keeps the old one, so all messages fail with "Agent not found"
until you restart the process.

Added a lazy re-resolution fallback in the bridge dispatch path — on
"Agent not found", look up the configured agent name, call
find_agent_by_name() for the current UUID, update the cache, retry.
No overhead on the normal path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant