Skip to content

Feature: End Time Exporter#84

Open
mwestphall wants to merge 9 commits intorptaylor:masterfrom
mwestphall:feature/end-time-exporter
Open

Feature: End Time Exporter#84
mwestphall wants to merge 9 commits intorptaylor:masterfrom
mwestphall:feature/end-time-exporter

Conversation

@mwestphall
Copy link
Collaborator

This pull request is intended to increase the compatibility of Kuantifier with JupyterHub. Kuantifier's current configuration faces two issues pertaining to JupyterHub:

  • JupyterHub launches raw pods rather than jobs, so post-termination behavior of pods isn't guaranteed (they might disappear without ever entering the Completed state)
  • Pods launched by JupyterHub have repeated names, which is not compatible with kuantifier's name-level pod reporting.

To address this:

  • Update kuantifier to put the Pod UID in the LocalJobId APEL metric
  • Updates the Kuantifier helm chart to deploy a custom Prometheus exporter that tracks the "last seen alive" time for pods in a given namespace so that they can be accounted for if they are missed by the standard kube-state-metrics kube_pod_end_time metric.
  • Update Prometheus queries used by the chart to query pod end time based on kube_pod_end_time where available, and the new custom metric (kuantifier_pod_endtime) if not.

Sam Albin, who wrote the custom endtime exporter, provides the following explanation of the new metrics:

  • kuantifier_pod_last_seen: Timestamp when the pod was last observed.

  • kuantifier_pod_endtime: end time written when a pod disappears or stops reporting.

  • kuantifier_pod_cpu_requests: cpu requests pulled directly from the pod spec.

Query changes to support the new metrics:

Prefer the kube_pod_completion to fall back to kuantifier_pod_endtime if it's missing:

(max_over_time(kube_pod_completion_time{namespace="$NS"}[$RANGE])
 or on(pod, uid, namespace)
 max_over_time(kuantifier_pod_endtime{namespace="$NS"}[$RANGE]))

Prefer the kube_pod_container_resource_requests to fall back to kuantifeirl_pod_cpu_requests if it's missing:

(max_over_time(kube_pod_container_resource_requests{resource="cpu", node!="" ,namespace="$NS"}[$RANGE])
 or on(pod, uid, namespace)
 max_over_time(kuantifier_pod_cpu_requests{namespace="$NS"}[$RANGE]))

Same query adjusted to get the duration the pod ran end - start with fallbacks if the kube-state-metrics scraped metrics are missing:

(max by (pod, uid)(
  (max_over_time(kube_pod_completion_time{namespace="$NS"}[$RANGE])
   or on(pod, uid, namespace)
   max_over_time(kuantifier_pod_endtime{namespace="$NS"}[$RANGE]))
) -
max by (pod, uid)(
  max_over_time(kube_pod_start_time{namespace="$NS"}[$RANGE])
))
* on(pod,uid) group_left()
max by (pod,uid)(
  (max_over_time(kube_pod_container_resource_requests{resource="cpu",node!="",namespace="$NS"}[$RANGE])
   or on(pod,uid,namespace)
   max_over_time(kuantifier_pod_cpu_requests{namespace="$NS"}[$RANGE]))
)

@mwestphall mwestphall requested a review from rptaylor January 26, 2026 23:03

config:
# Namespace in the cluster where workload pods run
NAMESPACE: "example-namespace"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functionally the same as .Values.processor.config.NAMESPACE
In practice they would have to have the same value for the chart to work correctly.
Do you think it would make sense to consolidate them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this makes sense. I'll update the chart to pull NAMESPACE from a single location.

exporter.yaml Outdated
@@ -0,0 +1,262 @@
---
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a stray file, or meant to go in templates/ ?
The templates/exporter.yaml file doesn't have a configmap ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this should be removed. It's the raw k8s config for the exporter that Sam developed before I converted it to a helm chart.

@rptaylor
Copy link
Owner

Related to #82 , #83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants