Skip to content

[FEATURE]: Dynamic Weight Calculation for HTTPRoute BackendRefs Based on ReadyReplicas #60

@LikiosSedo

Description

@LikiosSedo

🚀 Feature Description and Motivation

Feature Description and Motivation

I'm proposing dynamic weight calculation for HTTPRoute backendRefs based on Application ReadyReplicas because the current static weight configuration causes unfair traffic distribution in multi-tenant environments.

Current Problem

In the current Arks operator implementation (internal/controller/arksendpoint_controller.go), HTTPRoute backend weights are statically set to DefaultWeight (typically 1) for all applications, regardless of their replica counts. When multiple applications with different replica counts share the same HTTPRoute endpoint, this causes resource allocation inefficiency.

Concrete Example:

Consider three applications sharing the same endpoint:

  • Application A: 10 replicas, weight=1 → receives 33.3% traffic (should be 62.5%)
  • Application B: 5 replicas, weight=1 → receives 33.3% traffic (should be 31.25%)
  • Application C: 1 replica, weight=1 → receives 33.3% traffic (should be 6.25%)

Result: Application C (1 replica) is overloaded with 5x expected traffic, while Application A (10 replicas) is underutilized with 67% capacity wasted.

Root Cause

The root cause is a mismatch between Gateway-layer traffic distribution (based on static weights) and Application capacity (based on dynamic replica counts).

Proposed Feature

Implement dynamic weight calculation: weight = Application.Status.ReadyReplicas

This would help solve the multi-tenant load balancing problem and benefit Arks users by:

  1. Fair Traffic Distribution: Traffic distributed proportionally to each application's capacity
  2. Automatic Scaling Support: Weight automatically adjusts during HPA scaling operations
  3. Progressive Deployment: Allow partial Ready state (e.g., 7/10 Ready) instead of requiring 100% Ready before adding to HTTPRoute
  4. Better Resource Utilization: High-capacity applications receive proportionally more traffic, preventing overload of small applications

Use Case

Use Case

In my enterprise multi-tenant deployment, I often need to share a single LLM model endpoint across multiple tenants with different QPS requirements and replica counts.

Specific Scenario:

We have a production Kubernetes cluster running Arks with multiple tenants:

  • Tenant A (high-traffic customer): Deploys ArksApplication with 10 replicas for high QPS
  • Tenant B (medium-traffic customer): Deploys ArksApplication with 5 replicas
  • Tenant C (low-traffic customer): Deploys ArksApplication with 1 replica

All three tenants share the same ArksEndpoint for cost efficiency and resource sharing.

Current Problem:

With static weights (all set to 1), the Gateway distributes traffic equally (33.3% each), causing:

  • Tenant C's single replica is overwhelmed with excessive traffic → high latency, potential service degradation
  • Tenant A's 10 replicas are underutilized → wasted infrastructure costs
  • Poor multi-tenant experience and unfair resource allocation

During Scaling Operations:

When Tenant A scales from 3 to 10 replicas (HPA-triggered), the current implementation requires all replicas Ready before adding to HTTPRoute. This causes traffic interruption during normal scaling operations.

Expected Outcome with Dynamic Weight:

This feature would allow me to:

  1. Achieve fair traffic distribution: Each tenant receives traffic proportional to their capacity (Tenant A: 62.5%, Tenant B: 31.25%, Tenant C: 6.25%)
  2. Support progressive scaling: During scaling, Tenant A receives traffic as soon as pods become Ready (e.g., 4/10 Ready → weight=4), preventing traffic loss
  3. Eliminate manual weight configuration: Weights automatically adjust as replicas scale up/down
  4. Improve resource efficiency: Better utilization across all tenants, reducing infrastructure waste

This aligns with Arks' positioning as an enterprise multi-tenant LLM inference platform and addresses a core requirement for fair resource sharing in shared infrastructure environments.

Proposed Solution

Proposed Solution

One possible approach could be to modify the ArksEndpoint controller to calculate HTTPRoute backend weights dynamically based on Application ReadyReplicas. This would integrate with Arks's existing reconciliation logic by updating the weight calculation during HTTPRoute backendRef creation.

Implementation Approach

Modified Component: internal/controller/arksendpoint_controller.go (ArksEndpoint reconciliation logic)

Change 1: ArksApplication (Monolithic Architecture)

Current Implementation (Lines 292-322):

// Require all replicas Ready
if app.Spec.Replicas != int(app.Status.ReadyReplicas) {
    klog.V(4).InfoS("application status not ready", "service", svcName)
    continue  // Skip if not fully Ready
}
backendRef := gatewayv1.HTTPBackendRef{
    Weight: &ep.Spec.DefaultWeight,  // Static weight
}

Proposed Implementation:

// Calculate dynamic weight based on ReadyReplicas
weight := int32(app.Status.ReadyReplicas)
if weight == 0 {
    klog.V(4).InfoS("application has no ready replicas, skipping",
        "service", svcName,
        "replicas", app.Spec.Replicas,
        "readyReplicas", app.Status.ReadyReplicas)
    continue  // Skip only if zero Ready
}
backendRef := gatewayv1.HTTPBackendRef{
    Weight: &weight,  // Dynamic weight
}
klog.InfoS("application added to http route with dynamic weight",
    "service", svcName,
    "weight", weight,
    "readyReplicas", app.Status.ReadyReplicas,
    "replicas", app.Spec.Replicas)

Change 2: ArksDisaggregatedApplication (Prefill/Decode Separation)

Current Implementation (Lines 326-365):

// Require Prefill and Decode fully Ready
if app.Status.Router.ReadyReplicas > 0 &&
    app.Status.Prefill.ReadyReplicas == app.Status.Prefill.Replicas &&
    app.Status.Decode.ReadyReplicas == app.Status.Decode.Replicas {
    // OK
} else {
    continue
}
backendRef := gatewayv1.HTTPBackendRef{
    Weight: &ep.Spec.DefaultWeight,
}

Proposed Implementation:

// Use Router.ReadyReplicas as weight
weight := int32(app.Status.Router.ReadyReplicas)
if weight == 0 {
    klog.InfoS("disaggregated application has no ready router pods, skipping",
        "service", svcName,
        "routerReplicas", app.Status.Router.Replicas,
        "routerReadyReplicas", app.Status.Router.ReadyReplicas)
    continue
}
// Verify Prefill and Decode have at least some ready replicas
if app.Status.Prefill.ReadyReplicas == 0 || app.Status.Decode.ReadyReplicas == 0 {
    klog.InfoS("disaggregated application prefill/decode not ready, skipping",
        "service", svcName,
        "prefillReadyReplicas", app.Status.Prefill.ReadyReplicas,
        "decodeReadyReplicas", app.Status.Decode.ReadyReplicas)
    continue
}
backendRef := gatewayv1.HTTPBackendRef{
    Weight: &weight,  // Dynamic weight = Router.ReadyReplicas
}

Rationale: In disaggregated architecture, Router pods serve as the entry point and concurrency bottleneck. The weight should reflect Router capacity for Gateway-layer load balancing.

Integration with Existing Components

This approach integrates with Arks's existing cloud-native architecture by:

  1. ArksEndpoint CRD: No schema changes required, uses existing Status.ReadyReplicas field
  2. Gateway API: Leverages standard HTTPRoute weight field (fully compatible with Kubernetes Gateway API spec)
  3. Reconciliation Loop: Automatically updates weights during normal ArksEndpoint reconciliation cycles
  4. Envoy Gateway: No changes needed, Envoy respects updated HTTPRoute weights immediately

Compatibility

Backward Compatible:

  • No API/CRD breaking changes
  • No migration required
  • Existing deployments automatically benefit after operator upgrade
  • Safe rollback using kubectl rollout undo

Validation

This approach has been validated in a production-like multi-tenant environment:

  • Test 1: Multi-tenant load balancing with 3 applications (1, 10, 5 replicas) - PASS
  • Test 2: Progressive deployment during scaling - PASS (weight tracks ReadyReplicas in real-time)
  • Test 3: Edge cases (zero replicas, recovery) - PASS
  • Test 4: Performance and stability - PASS (0 restarts, 3s reconcile time)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions