Update FSLF docs: 10 per judge call, budget-capped at 15, auto-trains per experiment

dgerog · claude · dgerog · commit 737cf415d466 · 2026-03-15T13:08:29.000+02:00
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/docs/docs/concepts/llm-as-judge.md b/docs/docs/concepts/llm-as-judge.md
@@ -29,10 +29,10 @@ The judge considers multiple dimensions when evaluating a response:
 
 ## Few-Shot Learning (FSLF)
 
-The judge improves over time through the **Few-Shot Learning Framework**. Human-annotated examples (quality rank 1--2) and high-confidence auto-labeled examples (quality rank 3, >=85% confidence) are used to calibrate the judge for your specific agent. Up to 15 few-shot examples are included per judge call.
+The judge improves over time through the **Few-Shot Learning Framework**. Human-annotated examples (quality rank 1--2) and high-confidence auto-labeled examples (quality rank 3, >=85% confidence) are used to calibrate the judge for your specific agent. Up to 10 few-shot examples are included per judge call, budget-capped at 15 per project with PASS/FAIL balanced selection.
 
 !!! info "Enable FSLF"
-    Set the `few_shot_framework_enabled` flag on your project via the platform dashboard or API. The framework auto-trains nightly at 02:00 UTC.
+    Set the `few_shot_framework_enabled` flag on your project via the platform dashboard or API. The framework auto-trains after each adversarial experiment completes and runs a nightly sweep at 02:00 UTC.
 
 ## Choosing a Judge Provider