Skip to content

fix: prevent duplicate listener clash on port 12001 and schema validation#790

Open
abdhilabs wants to merge 1 commit intokatanemo:mainfrom
abdhilabs:fix/duplicate-listener-clash
Open

fix: prevent duplicate listener clash on port 12001 and schema validation#790
abdhilabs wants to merge 1 commit intokatanemo:mainfrom
abdhilabs:fix/duplicate-listener-clash

Conversation

@abdhilabs
Copy link
Copy Markdown

Summary

This PR fixes two critical bugs found during deployment of Plano v0.3.0:

1. Duplicate Listener Bug (port 12001 conflict)

Problem: The internal Python config generator creates a default egress_traffic_llm listener on port 12001 in the Envoy configuration template. When users define their own listener on port 12001 in their config.yaml, this causes Envoy to crash due to duplicate listeners on the same port.

Solution: Modified config/envoy.template.yaml to add a conditional check that skips rendering the default egress_traffic_llm listener if the user has already defined a listener on port 12001.

2. Config Validation Mismatch

Problem: The JSON schema in plano_config_schema.yaml had a restrictive enum constraint for the provider_interface field that only included: arch, claude, deepseek, groq, mistral, openai, gemini. This caused validation failures for users using other supported providers like azure_openai, ollama, qwen, amazon_bedrock, anthropic, together_ai, xai, moonshotai, and zhipu.

Solution: Removed the restrictive enum and replaced it with an open string type with a description explaining how to use it for both supported and custom providers.

Changes Made

  1. config/envoy.template.yaml: Added Jinja2 conditional to check if port 12001 is already in use by user-defined listeners
  2. config/plano_config_schema.yaml: Updated provider_interface field in both model_providers and llm_providers to use open string type instead of restrictive enum

Testing

These fixes were validated in our custom deployment environment before submitting this PR.

…tion

- envoy.template.yaml: Add conditional check to skip rendering default
  egress_traffic_llm listener (port 12001) if user already defined a
  listener on the same port. This prevents Envoy from crashing due to
  duplicate listeners.

- plano_config_schema.yaml: Remove restrictive enum for provider_interface
  field in model_providers and llm_providers. The enum was too restrictive
  and didn't include all supported providers (azure_openai, ollama, qwen,
  amazon_bedrock, anthropic, together_ai, xai, moonshotai, zhipu). Now uses
  open string type with description for flexibility with custom providers.
@salmanap
Copy link
Copy Markdown
Contributor

This is a welcome change. Reviewing shortly.

typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

{# Skip default egress_traffic_llm listener if user already defined a listener on port 12001 #}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically port 12001 is an internal port where we plugin all the WASM filters needed to process, mutate and handle LLM traffic. The users's port is defaulted to 12000 and the config allows you to update that. Because this is a reserved port, there is not an easy fix except that we export this port to the user via the config or pick something that is highly unlikely to conflict with the local dev environment.

- mistral
- openai
- gemini
description: "The provider interface/name. For supported providers (openai, anthropic, gemini, etc.), specify as part of model name (e.g., 'openai/gpt-4'). For custom providers, provide the interface name here."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can appreciate this change - but I think for all first class providers we support we write out their upstream endpoints based on this enum. So if the developer offers a first-class provider that we don't support today then we would break because we wouldn't know which endpoint/cluster to send traffic to.

We also support developers to provide custom providers and that shouldn't conflict with this list. Although I think you are correct in stating that the provider_interface field is probably misleading. Its only really used for custom model providers and not for first-class providers at all.

I think there is a fix for this, but its not quite stripping out the enum changes. I'll have to think about that a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants