Skip to content

Add Support for terminationGracePeriodSeconds and lifecycle Hooks in LangGraph Helm Chart #459

@tjsudarsan

Description

The current LangGraph Helm chart (langchain/langgraph-cloud) does not support configuring the following critical Kubernetes pod spec fields through Helm values:

  1. terminationGracePeriodSeconds - Controls how long Kubernetes waits before forcefully terminating a pod
  2. lifecycle.preStop - Allows running commands before a container is terminated

These fields are essential for production deployments where long-running graph operations need time to complete gracefully during pod termination.

Use Case

In our production environment, we have LangGraph operations that can run for up to 15 minutes. Without proper termination grace period configuration:

  • Long-running graphs are forcefully terminated after the default 30 seconds
  • Users experience failures when pods are evicted or during rolling updates
  • No proper connection draining occurs, leading to "connection refused" errors

Current Workaround

Currently, we have to manually patch deployments after each Helm deployment:

# API Server
kubectl patch deployment langgraph-api-server -p '{
  "spec": {
    "template": {
      "spec": {
        "terminationGracePeriodSeconds": 900,
        "containers": [{
          "name": "api-server",
          "lifecycle": {
            "preStop": {
              "exec": {
                "command": ["/bin/sh", "-c", "sleep 60"]
              }
            }
          }
        }]
      }
    }
  }
}'

# Worker
kubectl patch deployment langgraph-queue -p '{
  "spec": {
    "template": {
      "spec": {
        "terminationGracePeriodSeconds": 900,
        "containers": [{
          "name": "queue",
          "lifecycle": {
            "preStop": {
              "exec": {
                "command": ["/bin/sh", "-c", "sleep 60"]
              }
            }
          }
        }]
      }
    }
  }
}'

Proposed Solution

Add support for these fields in the Helm values.yaml:

apiServer:
  deployment:
    # Add support for pod-level termination grace period
    terminationGracePeriodSeconds: 900

    # Add support for container lifecycle hooks
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 60"]

    # Existing fields...
    resources:
      requests:
        cpu: 2000m
        memory: 4Gi

queue:
  deployment:
    # Same support for worker pods
    terminationGracePeriodSeconds: 900
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 60"]

Template Changes Required

In the deployment templates (api-server-deployment.yaml and similar):

spec:
  template:
    spec:
      {{- if .Values.apiServer.deployment.terminationGracePeriodSeconds }}
      terminationGracePeriodSeconds: {{ .Values.apiServer.deployment.terminationGracePeriodSeconds }}
      {{- end }}
      containers:
        - name: {{ .Values.apiServer.name }}
          {{- if .Values.apiServer.deployment.lifecycle }}
          lifecycle:
            {{- toYaml .Values.apiServer.deployment.lifecycle | nindent 12 }}
          {{- end }}
          # ... rest of container spec

Additional Context

This is particularly important for:

  • Kubernetes clusters with node autoscaling (Karpenter, Cluster Autoscaler)
  • Production environments with strict SLAs
  • Long-running AI/ML workloads typical in LangGraph applications

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions