-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
kelos/needs-inputkind/featureCategorizes issue or PR as related to a new featureCategorizes issue or PR as related to a new featureneeds-actorneeds-priorityneeds-triage
Description
Problem
The current GitHubSource in internal/source/github.go is stateless — every poll cycle does a full re-fetch from the GitHub API:
- List all matching issues → 1-10 API calls (paginated, up to
maxPages=10) - For each matched issue, fetch comments → N API calls
- Total: ~10 + N calls per cycle
With a pollInterval: 1m, this works fine when labels do server-side filtering (N is small). But if we move to comment-based or other client-side filtering (see related issue), N becomes ALL open issues, potentially hitting 60K+ calls/hour on active repos — far exceeding GitHub's 5K/hr rate limit.
Proposed Solution: GitHub Informer
Implement a cache-based GitHub watcher, analogous to the Kubernetes informer pattern:
| K8s Informer | GitHub Informer |
|---|---|
Initial List all resources |
Initial List all issues + comments |
Watch stream for changes |
Poll with since param / ETags for deltas |
| Local cache (store) serves reads | Local cache serves Discover() |
| Reflector keeps cache in sync | Syncer keeps cache in sync |
Architecture
┌─────────────────────────┐
│ GitHub Informer │
│ │
GitHub API ──────→│ Syncer (ETag/since) │
(delta updates) │ │ │
│ ▼ │
│ Local Cache │
│ (issues + comments │
│ + labels + reactions) │
└────────┬────────────────┘
│
┌────────▼────────────────┐
│ GitHubSource.Discover() │
│ │
│ Reads from cache, not │
│ from API. Can filter by │
│ labels OR comments OR │
│ anything — all free. │
└─────────────────────────┘
How It Works
Initial sync (once):
- List all open issues + their comments → expensive but one-time
Subsequent cycles:
GET /issues?since=<last_sync_time>→ only changed issues (usually 0-5)GET /issues/{n}/comments?since=<last_sync_time>→ only new comments- GitHub supports ETags (
If-None-Matchheader) — a304 Not Modifiedresponse does not count against rate limits - Cost drops from O(issues) to O(changes) per cycle
Key Benefits
- Decouples data fetching from state evaluation. Once the cache is warm, filtering by labels, comments, reactions, or any future signal mechanism is free — just read from cache.
- Makes the label vs. comment debate a UX decision, not a technical constraint. The rate limit concern (biggest objection to comment-based filtering) is eliminated.
- Familiar pattern. The team already works with K8s informers. Same mental model.
- Backward compatible.
GitHubSource.Discover()returns the same[]WorkItem— the interface doesn't change, only the implementation.
Implementation Sketch
// GitHubInformer maintains a local cache of GitHub issues and comments.
type GitHubInformer struct {
owner string
repo string
token string
baseURL string
client *http.Client
// Cache
mu sync.RWMutex
issues map[int]*CachedIssue // issue number → cached data
lastSync time.Time // for `since` parameter
etag string // for conditional requests
}
type CachedIssue struct {
Issue githubIssue
Comments []githubComment
LastSeen time.Time
}
// Sync fetches changes since last sync and updates the cache.
func (i *GitHubInformer) Sync(ctx context.Context) error {
// Use ?since=<lastSync> to get only changed issues
// Use If-None-Match for ETag-based conditional requests
// Update cache incrementally
// Evict closed issues
}
// GitHubSource.Discover() reads from the informer cache instead of making API calls.
func (s *GitHubSource) Discover(ctx context.Context) ([]WorkItem, error) {
if err := s.informer.Sync(ctx); err != nil {
return nil, err
}
// Read from cache — filter by labels, comments, whatever is configured
// Zero additional API calls
}Considerations
- Memory usage: A repo with 10K open issues needs meaningful memory. Mitigate by only caching open issues and evicting closed ones on sync.
- Cold start: First sync is expensive. Could persist cache to ConfigMap/PV for fast restarts, or accept one slow cycle.
- Consistency: Cache is at most
pollIntervalstale — same as current system (it's polling too). - Multiple spawners: If multiple TaskSpawners watch the same repo, consider a shared informer (like K8s
SharedInformerFactory) to avoid duplicate API calls. - Webhook upgrade path: The informer architecture naturally supports adding GitHub webhooks later for real-time cache invalidation, replacing polling entirely.
Files That Would Change
internal/source/github.go— Split into informer + source;Discover()becomes cache readerinternal/source/github_informer.go— New file for the informer implementationcmd/axon-spawner/main.go— Initialize informer, pass to sourceinternal/source/github_test.go— Tests for delta sync, ETag handling, cache behavior
Related
- Comment-based workflow control proposal: Moving beyond label-based workflow control #417 (depends on this for viable API rate limits)
- Current polling implementation:
internal/source/github.go - K8s informer pattern:
k8s.io/client-go/tools/cache
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kelos/needs-inputkind/featureCategorizes issue or PR as related to a new featureCategorizes issue or PR as related to a new featureneeds-actorneeds-priorityneeds-triage