Informer cache by szautkin · Pull Request #1034 · opencadc/science-platform

szautkin · 2026-03-05T14:03:30Z

Add Kubernetes SharedInformerFactory cache for Jobs, Pods, and Nodes

Problem

Skaha has no informer cache. Every incoming HTTP request makes direct K8s API calls — listNamespacedJob, listNamespacedPod, listNode — each returning the full set of objects serialized as JSON over HTTP.

With 5000+ headless jobs and the Science Portal polling every few seconds per user, each poll triggers at minimum 3 API calls that each return thousands of objects. This creates significant load on the K8s API server and adds unnecessary latency to every session listing, stats query, and session creation (which checks existing session counts).

Affected code paths:

GET /v1/session — calls SessionDAO.getUserSessions() → listNamespacedJob
GET /v1/session?view=stats — calls SessionDAO.getAllocatedPodResources() → listNamespacedPod + NodeDAO.getCapacity() → listNode
GET /v1/session?view=interactive — calls SessionDAO.getUserSessions() → listNamespacedJob
GET /v1/session/{id} — calls SessionDAO.getUserSessions() → listNamespacedJob
POST /v1/session — calls SessionDAO.getUserSessions() → listNamespacedJob (session limit check)

Solution

Introduce a K8SInformerCache singleton that uses the kubernetes-client-java SharedInformerFactory to maintain in-memory mirrors of Jobs, Pods, and Nodes via persistent watch streams.

On startup: One initial LIST + a persistent WATCH connection per resource type (3 total)
On each request: All read operations are served from the in-memory cache with zero network calls. Filtering by user, session ID, type, and pod status happens in-memory.
Resync: 30-second periodic resync to ensure consistency

Write operations (delete/create jobs) remain as direct API calls. PodResourceUsage (kubectl top / metrics API) is unchanged. All DAO methods fall back to direct API calls if the cache is not running.

Changes

File	Change
`K8SInformerCache.java`	New singleton managing `SharedInformerFactory` with informers for `V1Job` (namespaced), `V1Pod` (namespaced), and `V1Node` (cluster-scoped)
`InitializationAction.java`	Starts the informer cache after K8s API client initialization in `doInit()`
`SessionDAO.java`	`getUserSessions()`, `getJob()`, and `getAllocatedPodResources()` read from cache with in-memory filtering; fallback to direct API calls
`NodeDAO.java`	`getCapacities()` reads from cache with in-memory filtering for `unschedulable` and label selectors; adds `matchesLabelSelector()` utility

Measurement

To verify the reduction in API server load after deployment:

kubectl get --raw /metrics | grep apiserver_request_total | grep LIST | grep -E 'jobs|pods|nodes'

LIST request counts for jobs/pods/nodes should drop to near-zero (only the initial list + periodic resync).

… API calls for Jobs, Pods, and Nodes.

feat: Implement and integrate a Kubernetes informer cache to optimize…

96c2bc8

… API calls for Jobs, Pods, and Nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Informer cache#1034

Informer cache#1034
szautkin wants to merge 1 commit intoopencadc:mainfrom
szautkin:skaha-informer-cache

szautkin commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szautkin commented Mar 5, 2026

Add Kubernetes SharedInformerFactory cache for Jobs, Pods, and Nodes

Problem

Solution

Changes

Measurement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants