Skip to content

Add image-generation task exclusion to leaderboard views#52

Open
riceharvest wants to merge 3 commits intopinchbench:mainfrom
riceharvest:feat/ai-image-sorting-pr-prep
Open

Add image-generation task exclusion to leaderboard views#52
riceharvest wants to merge 3 commits intopinchbench:mainfrom
riceharvest:feat/ai-image-sorting-pr-prep

Conversation

@riceharvest
Copy link
Copy Markdown

Summary

Adds an excludeImageGen leaderboard mode that removes image-generation tasks from score calculations and related views.

This is implemented dynamically from submission data rather than hardcoding task counts, so it stays correct if tasks are added later or if more excluded tasks are introduced.

What changed

Leaderboard filtering

  • add excludeImageGen query-param support on the main leaderboard page
  • recalculate filtered leaderboard scores from actual submission task data
  • recompute derived metrics (including CPST) from the remaining tasks only
  • preserve entries if submission detail fetches fail instead of silently dropping models

Shared task exclusion logic

  • add centralized exclusion helpers in lib/task-metadata.ts
  • make exclusion logic reusable across leaderboard, submission, and heatmap views

UI updates

  • add “Exclude image generation tasks” toggle to leaderboard controls
  • pass filtered state through leaderboard view
  • make task heatmap honor the exclusion mode
  • make submission detail pages honor the exclusion mode

Fixes included

  • fix parser/markup issues in components/task-heatmap.tsx
  • fix runtime issues from missing task-exclusion imports
  • avoid hardcoded filtered task counts

Files changed

  • app/page.tsx
  • app/submission/[id]/page.tsx
  • components/leaderboard-header.tsx
  • components/leaderboard-view.tsx
  • components/task-breakdown.tsx
  • components/task-heatmap.tsx
  • lib/task-metadata.ts
  • lib/transforms.ts

Validation

Build

  • npm run build

Route smoke tests

  • /
  • /?excludeImageGen=true
  • /about
  • /runs
  • /claim
  • /claim/success
  • /claim/error
  • /user/test
  • /submission/5d73c775-fb81-4df1-ac2f-a08434541601
  • /submission/5d73c775-fb81-4df1-ac2f-a08434541601?excludeImageGen=true
  • invalid route returns 404 ✅

Browser checks

  • unfiltered home loads ✅
  • filtered home loads ✅
  • filtered submission page loads ✅
  • runs page loads ✅
  • no browser console/page errors observed ✅

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 22, 2026

@riceharvest is attempting to deploy a commit to the Kilo Code Team on Vercel.

A member of the Team first needs to authorize it.

const bestSubmissions = []
const batchSize = 10

for (let i = 0; i < transformedEntries.length; i += batchSize) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Performance — N individual API calls on every cold-cache page load

When excludeImageGen=true, this server component sequentially fetches submission details for every leaderboard entry (batched 10 at a time). If there are 50+ models on the leaderboard, that's 5+ serial rounds of API calls blocking the page response. Even with ISR (revalidate: 60), every 60 seconds a user will hit a cold cache and experience significant latency.

Consider:

  1. Adding a dedicated API endpoint (e.g. /leaderboard?excludeTasks=task_13_image_gen) that computes adjusted scores server-side, avoiding the N+1 fetch pattern entirely.
  2. If a new endpoint isn't feasible, moving this computation to the client side (similar to how TaskHeatmap already fetches submission data client-side) so the page renders immediately and scores update asynchronously.
  3. At minimum, using Promise.all for all entries at once instead of sequential batches — the batching here throttles throughput but doesn't reduce total call count.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Mar 22, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 1
Issue Details (click to expand)

WARNING

File Line Issue
app/page.tsx 60 N+1 server-side API pattern — when excludeImageGen=true, the server component fetches every model's submission detail sequentially (batched 10 at a time), blocking page response on cold cache. Consider a server-side API endpoint or moving computation client-side.
components/task-heatmap.tsx useMemo deps Adding excludeImageGen to the allTasks useMemo dependency array is correct, but the original useEffect that fetches submissions (line 111, [entries]) will re-fire when the parent re-renders with new entries after the toggle — the filter itself is properly handled in the useMemo without refetching. Ensure the useEffect dep array does NOT include excludeImageGen to avoid unnecessary refetches (current code is correct on this).

SUGGESTION

File Line Issue
app/submission/[id]/page.tsx 261 displayTasks is already pre-filtered, but excludeImageGen is also passed to TaskBreakdown causing redundant double-filtering. Pass only one or the other.
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
app/submission/[id]/page.tsx 39, 108 The Back button links (href={officialOnly ? '/' : '/?official=false'}) do not preserve the excludeImageGen query parameter — users lose their filter state when navigating back from a submission detail page.
app/page.tsx 12-16 generateMetadata does not include excludeImageGen in the OG image URL params, so the OG image won't reflect the filtered state. Low severity since OG images are typically for the default view.
lib/task-metadata.ts 8-21 TASK_FALLBACK does not include an entry for task_13_image_gen (or tasks 11-12). If the API doesn't provide frontmatter for these tasks, they fall back to the raw task ID as the display name. Pre-existing, not introduced by this PR.
Files Reviewed (8 files)
  • app/page.tsx - 1 issue (N+1 API pattern)
  • app/submission/[id]/page.tsx - 1 issue (redundant filtering)
  • components/leaderboard-header.tsx - no issues
  • components/leaderboard-view.tsx - no issues
  • components/task-breakdown.tsx - no issues
  • components/task-heatmap.tsx - 1 issue (dependency management)
  • lib/task-metadata.ts - no issues
  • lib/transforms.ts - no issues

Fix these issues in Kilo Cloud


Reviewed by claude-opus-4.6 · 717,149 tokens

@riceharvest
Copy link
Copy Markdown
Author

Addressed the redundant filtering note in by passing the already-filtered directly to .

On the N+1 point in : agreed this is not ideal architecturally. For this PR I kept the implementation server-side so the filtered view stays deterministic and consistent with the submission-detail pages, while also falling back to the original entry if a submission detail fetch fails. If we want to optimize further, the next step should be a dedicated backend/API path that returns pre-adjusted leaderboard data (or task-level aggregates) so we can avoid per-entry submission fetches during SSR.

@riceharvest
Copy link
Copy Markdown
Author

Addressed the redundant filtering note in app/submission/[id]/page.tsx by passing the already-filtered displayTasks directly to TaskBreakdown.

On the N+1 point in app/page.tsx: agreed this is not ideal architecturally. For this PR I kept the implementation server-side so the filtered view stays deterministic and consistent with the submission-detail pages, while also falling back to the original entry if a submission detail fetch fails. If we want to optimize further, the next step should be a dedicated backend/API path that returns pre-adjusted leaderboard data (or task-level aggregates) so we can avoid per-entry submission fetches during SSR.

@riceharvest
Copy link
Copy Markdown
Author

Follow-up cleanup pushed in a8588c6:

  • TaskHeatmap no longer re-fetches all submission data when toggling image exclusion
  • submission fetches remain keyed only to the leaderboard entries
  • the exclusion toggle now filters already-loaded heatmap task data in memory

That should address the heatmap re-fetch performance note while keeping the existing behavior unchanged.

@ScuttleBot
Copy link
Copy Markdown

👋 Hi @riceharvest! I'm @olearycrew's OpenClaw bot, just dropping in with a quick update.

Thanks for addressing the review feedback on this PR! The changes look good.

One thing to flag: the PR now has merge conflicts — likely from some recent merges to main. Would you be able to rebase on main to resolve those when you get a chance?

Also, there's a Vercel deploy authorization pending, but that's on the maintainer side — nothing you need to do there.

Appreciate your contribution! 🦀

Copy link
Copy Markdown
Member

@olearycrew olearycrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflict resolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants