Skip to content

Commit d4d22ea

Browse files
committed
feat: enhance scientific writing tests with page budget management
- Added tests to validate the use of target_main_pages and minimum_main_pages in the page budget manager. - Updated existing tests to reflect changes in the scientific writing policy. feat: improve state graph runtime tests for usage tracking and budget management - Introduced tests to accumulate successful node usage and persist it. - Added tests for handling failed nodes and budget constraints during execution. - Enhanced existing tests to ensure proper rollback behavior and budget guard functionality. feat: extend terminal app plan execution tests for feedback handling - Added tests to verify the correct handling of feedback and critique during plan execution. feat: enhance write paper PDF build tests for page budget validation - Updated tests to include validation for minimum and target main pages in compiled page validation. - Added new tests to ensure proper handling of PDF compilation exceeding target page budgets. feat: implement model pricing logic for LLM clients - Introduced model pricing logic to compute usage costs based on token consumption. - Added tests to validate the pricing propagation for OpenAI and Codex models. feat: create event logging and replay functionality - Implemented a persisted event stream for logging run events and replaying them. test: add comprehensive tests for model pricing and usage computation - Created tests to ensure correct resolution of model billing and computation of usage costs.
1 parent 6d2354b commit d4d22ea

44 files changed

Lines changed: 2951 additions & 192 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,8 @@ Briefs carry **core** sections (topic, objective metric) and **governance** sect
310310

311311
</details>
312312

313+
When `## Manuscript Format` is used, `main_body_pages` is treated as the nominal main-body page target, not a hard upper cap. In config, prefer `paper_profile.target_main_pages` and `paper_profile.minimum_main_pages`; the legacy `paper_profile.main_page_limit` field remains as a compatibility alias during migration.
314+
313315
---
314316

315317
## Governance Artifact Flow

docs/architecture.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,12 @@ Top-level progression to paper-writing behavior should preserve the distinction
7070

7171
A paper-scale outcome requires evidence beyond successful orchestration, including baseline/comparator presence, real experiment execution, quantitative comparison, and claim-to-evidence linkage.
7272

73+
Page-budget semantics should also remain explicit:
74+
75+
- `paper_profile.target_main_pages` drives main-body writing budgets
76+
- `paper_profile.minimum_main_pages` gates the compiled-PDF floor check
77+
- legacy `paper_profile.main_page_limit` is only a compatibility alias during migration and should not be treated as a hard maximum
78+
7379
## 6) Research brief contract
7480

7581
A governed run should begin from a research brief that defines the execution contract.

docs/paper-quality-bar.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,13 @@ After drafting, `write_paper` emits a post-draft critique that can:
5454

5555
In that case the correct action is upstream repair/backtrack, not spending drafting or PDF-compilation effort on a manuscript that should still be blocked.
5656

57+
### Page-budget semantics
58+
`write_paper` should treat page budgets as explicit targets/floors, not as an implicit upper cap:
59+
60+
- `paper_profile.target_main_pages` is the nominal main-body target used for writing budgets
61+
- `paper_profile.minimum_main_pages` is the compiled-PDF floor checked after LaTeX build
62+
- legacy `paper_profile.main_page_limit` remains a compatibility alias and should not be interpreted as a maximum-page constraint
63+
5764
## 4) Venue-style targeting
5865
Users can select a target venue style via `paper_profile.target_venue_style` in config.
5966
Supported venues: `acl`, `aaai`, `icml`, `neurips`, `iclr`, `generic_nlp_conference`, `generic_ml_conference`, `generic_cs_paper`.

docs/research-brief-template.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,20 @@ Recommended:
4747
6. analyze results
4848
7. draft only after evidence is sufficient
4949

50+
## Manuscript Format
51+
Optional manuscript-format targets for writing and validation.
52+
53+
Recommended:
54+
- columns: 1 or 2
55+
- main_body_pages: nominal target page count for the main body
56+
- references_excluded_from_page_limit: true/false
57+
- appendices_excluded_from_page_limit: true/false
58+
59+
Important:
60+
- `main_body_pages` is a page-budget target, not a hard upper cap.
61+
- AutoLabOS uses it to size writing budgets and, unless overridden, as the minimum compiled main-body page floor.
62+
- If the compiled PDF lands below that floor, the page-budget check warns or fails depending on validation mode.
63+
5064
## Research Question
5165
Write one clear research question that could be answered by a small real experiment.
5266

@@ -194,4 +208,4 @@ Examples:
194208
- Is the dataset too small to support the claim?
195209
- Is the proposed comparison fair?
196210
- Are we relying too much on abstract-only papers?
197-
- Could a simpler baseline already dominate?
211+
- Could a simpler baseline already dominate?

src/config.ts

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,8 @@ function buildConfigFromWizardAnswers(answers: {
268268
venue_style: "acl_long",
269269
target_venue_style: "generic_cs_paper",
270270
column_count: 2,
271+
target_main_pages: 8,
272+
minimum_main_pages: 8,
271273
main_page_limit: 8,
272274
references_counted: false,
273275
appendix_allowed: true,
@@ -686,7 +688,8 @@ function normalizeLoadedConfig(config: AppConfig): AppConfig {
686688
config.workflow = {
687689
mode: "agent_approval",
688690
wizard_enabled: true,
689-
approval_mode: normalizeWorkflowApprovalMode(config.workflow.approval_mode)
691+
approval_mode: normalizeWorkflowApprovalMode(config.workflow.approval_mode),
692+
budget_guard_usd: normalizeBudgetGuardUsd(config.workflow.budget_guard_usd)
690693
};
691694
config.experiments = {
692695
runner: "local_python",
@@ -819,12 +822,23 @@ function normalizeWorkflowApprovalMode(value: unknown): WorkflowApprovalMode {
819822
return value === "manual" ? "manual" : "minimal";
820823
}
821824

825+
function normalizeBudgetGuardUsd(value: unknown): number | undefined {
826+
return typeof value === "number" && Number.isFinite(value) && value > 0 ? value : undefined;
827+
}
828+
829+
function normalizePaperProfilePageCount(value: unknown): number | undefined {
830+
return typeof value === "number" && Number.isFinite(value) ? Math.max(1, Math.round(value)) : undefined;
831+
}
832+
822833
function normalizePaperProfileConfig(value: AppConfig["paper_profile"] | undefined): AppConfig["paper_profile"] {
823834
const preferAppendixFor = Array.isArray(value?.prefer_appendix_for)
824835
? value?.prefer_appendix_for
825836
.filter((item): item is string => typeof item === "string" && item.trim().length > 0)
826837
.map((item) => item.trim())
827838
: [];
839+
const legacyMainPageLimit = normalizePaperProfilePageCount(value?.main_page_limit);
840+
const targetMainPages = normalizePaperProfilePageCount(value?.target_main_pages) ?? legacyMainPageLimit ?? 8;
841+
const minimumMainPages = normalizePaperProfilePageCount(value?.minimum_main_pages) ?? legacyMainPageLimit ?? targetMainPages;
828842
const estimatedWordsPerPage =
829843
typeof value?.estimated_words_per_page === "number" && Number.isFinite(value.estimated_words_per_page)
830844
? Math.max(250, Math.round(value.estimated_words_per_page))
@@ -834,10 +848,9 @@ function normalizePaperProfileConfig(value: AppConfig["paper_profile"] | undefin
834848
venue_style: value?.venue_style?.trim() || "acl_long",
835849
target_venue_style: value?.target_venue_style?.trim() || undefined,
836850
column_count: value?.column_count === 1 ? 1 : 2,
837-
main_page_limit:
838-
typeof value?.main_page_limit === "number" && Number.isFinite(value.main_page_limit)
839-
? Math.max(1, Math.round(value.main_page_limit))
840-
: 8,
851+
target_main_pages: targetMainPages,
852+
minimum_main_pages: minimumMainPages,
853+
main_page_limit: minimumMainPages,
841854
references_counted: Boolean(value?.references_counted),
842855
appendix_allowed: value?.appendix_allowed !== false,
843856
appendix_format: value?.appendix_format === "single_column" ? "single_column" : "double_column",

src/core/agents/paperWriterSessionManager.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -814,7 +814,8 @@ function buildOutlinePrompt(bundle: PaperWritingBundle, paperProfile?: PaperProf
814814
? [
815815
`Venue style: ${paperProfile.venue_style}`,
816816
`Column count: ${paperProfile.column_count ?? 2}`,
817-
`Main page limit: ${paperProfile.main_page_limit}`,
817+
`Target main pages: ${paperProfile.target_main_pages ?? paperProfile.main_page_limit ?? "unknown"}`,
818+
`Minimum main pages: ${paperProfile.minimum_main_pages ?? paperProfile.main_page_limit ?? "unknown"}`,
818819
`References counted toward limit: ${paperProfile.references_counted}`,
819820
`Appendix allowed: ${paperProfile.appendix_allowed}`,
820821
`Appendix preferences: ${(paperProfile.prefer_appendix_for || []).join(", ") || "none"}`

src/core/analysis/llmPaperQualityEvaluator.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* about hard blocks, but the LLM cannot override them.
1515
*/
1616

17-
import type { LLMClient } from "../llm/client.js";
17+
import type { LLMClient, LLMCompletionUsage } from "../llm/client.js";
1818
import type { ReviewArtifactPresence } from "../reviewSystem.js";
1919
import type { AnalysisReport } from "../resultAnalysis.js";
2020
import type { MinimumGateResult } from "./paperMinimumGate.js";
@@ -194,7 +194,7 @@ export async function runLLMPaperQualityEvaluation(
194194
input: LLMEvaluatorInput,
195195
llm: LLMClient,
196196
opts?: { abortSignal?: AbortSignal; timeoutMs?: number }
197-
): Promise<{ evaluation: PaperQualityEvaluation; llmUsed: boolean; costUsd?: number }> {
197+
): Promise<{ evaluation: PaperQualityEvaluation; llmUsed: boolean; costUsd?: number; usage?: LLMCompletionUsage }> {
198198
const timeoutMs = opts?.timeoutMs ?? 30_000;
199199

200200
try {
@@ -246,7 +246,8 @@ export async function runLLMPaperQualityEvaluation(
246246
minimum_gate_ceiling: input.minimumGate.ceiling_type
247247
},
248248
llmUsed: true,
249-
costUsd: completion.usage?.costUsd
249+
costUsd: completion.usage?.costUsd,
250+
usage: completion.usage
250251
};
251252
} catch {
252253
return { evaluation: buildFallbackEvaluation(input), llmUsed: false };

src/core/analysis/scientificWriting.ts

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import YAML from "yaml";
22

3-
import type { PaperProfileConfig } from "../../types.js";
3+
import type { PaperProfileConfig, ResolvedPaperProfileConfig } from "../../types.js";
44
import type { ObjectiveMetricEvaluation, ObjectiveMetricProfile } from "../objectiveMetric.js";
55
import type { ConstraintProfile } from "../runConstraints.js";
66
import type {
@@ -216,6 +216,9 @@ export interface SectionBudgetEntry {
216216
export interface PageBudgetManagerReport {
217217
venue_style: string;
218218
column_count: 1 | 2;
219+
target_main_pages: number;
220+
minimum_main_pages: number;
221+
/** @deprecated Compatibility alias for minimum_main_pages. */
219222
main_page_limit: number;
220223
references_counted: boolean;
221224
appendix_allowed: boolean;
@@ -398,6 +401,8 @@ export interface WritePaperGateDecision {
398401
const DEFAULT_PAPER_PROFILE: PaperProfileConfig = {
399402
venue_style: "acl_long",
400403
column_count: 2,
404+
target_main_pages: 8,
405+
minimum_main_pages: 8,
401406
main_page_limit: 8,
402407
references_counted: false,
403408
appendix_allowed: true,
@@ -415,7 +420,7 @@ const DEFAULT_PAPER_PROFILE: PaperProfileConfig = {
415420
export function resolvePaperProfile(
416421
profile: Partial<PaperProfileConfig> | undefined,
417422
constraintProfile?: ConstraintProfile
418-
): PaperProfileConfig {
423+
): ResolvedPaperProfileConfig {
419424
const preferAppendixFor = Array.isArray(profile?.prefer_appendix_for)
420425
? profile?.prefer_appendix_for
421426
.filter((item): item is string => typeof item === "string" && item.trim().length > 0)
@@ -430,17 +435,28 @@ export function resolvePaperProfile(
430435
? "acl_short"
431436
: "acl_long"
432437
: DEFAULT_PAPER_PROFILE.venue_style);
433-
const inferredPageLimit =
438+
const legacyMainPageLimit =
434439
typeof profile?.main_page_limit === "number" && Number.isFinite(profile.main_page_limit)
435440
? Math.max(1, Math.round(profile.main_page_limit))
436-
: /\bshort\b/iu.test(lengthHint)
437-
? 4
438-
: DEFAULT_PAPER_PROFILE.main_page_limit;
441+
: undefined;
442+
const inferredTargetMainPages =
443+
typeof profile?.target_main_pages === "number" && Number.isFinite(profile.target_main_pages)
444+
? Math.max(1, Math.round(profile.target_main_pages))
445+
: legacyMainPageLimit
446+
?? (/\bshort\b/iu.test(lengthHint) ? 4 : (DEFAULT_PAPER_PROFILE.target_main_pages || 8));
447+
const inferredMinimumMainPages =
448+
typeof profile?.minimum_main_pages === "number" && Number.isFinite(profile.minimum_main_pages)
449+
? Math.max(1, Math.round(profile.minimum_main_pages))
450+
: legacyMainPageLimit
451+
?? inferredTargetMainPages;
439452

440453
return {
441454
venue_style: inferredVenueStyle,
455+
target_venue_style: cleanString(profile?.target_venue_style) || DEFAULT_PAPER_PROFILE.target_venue_style,
442456
column_count: profile?.column_count === 1 ? 1 : DEFAULT_PAPER_PROFILE.column_count,
443-
main_page_limit: inferredPageLimit,
457+
target_main_pages: inferredTargetMainPages,
458+
minimum_main_pages: inferredMinimumMainPages,
459+
main_page_limit: legacyMainPageLimit ?? inferredMinimumMainPages,
444460
references_counted:
445461
typeof profile?.references_counted === "boolean"
446462
? profile.references_counted
@@ -847,7 +863,7 @@ export function pageBudgetManager(input: {
847863
}): PageBudgetManagerReport {
848864
const profile = resolvePaperProfile(input.profile);
849865
const estimatedWordsPerPage = profile.estimated_words_per_page || 420;
850-
const targetMainWords = profile.main_page_limit * estimatedWordsPerPage;
866+
const targetMainWords = profile.target_main_pages * estimatedWordsPerPage;
851867
const minimumMainWords = Math.round(targetMainWords * 0.62);
852868
const maximumMainWords = Math.round(targetMainWords * 1.15);
853869
const estimatedMainWords = estimateDraftWords(input.draft.sections);
@@ -877,7 +893,7 @@ export function pageBudgetManager(input: {
877893
const warnings: string[] = [];
878894
if (estimatedMainWords < Math.round(targetMainWords * 0.55)) {
879895
warnings.push(
880-
`Estimated main-body length (${estimatedMainWords} words) is far below the ${profile.main_page_limit}-page target budget.`
896+
`Estimated main-body length (${estimatedMainWords} words) is far below the ${profile.target_main_pages}-page target budget.`
881897
);
882898
} else if (estimatedMainWords < minimumMainWords) {
883899
warnings.push(
@@ -894,6 +910,8 @@ export function pageBudgetManager(input: {
894910
return {
895911
venue_style: profile.venue_style,
896912
column_count: profile.column_count,
913+
target_main_pages: profile.target_main_pages,
914+
minimum_main_pages: profile.minimum_main_pages,
897915
main_page_limit: profile.main_page_limit,
898916
references_counted: profile.references_counted,
899917
appendix_allowed: profile.appendix_allowed,
@@ -3068,7 +3086,7 @@ export function buildScientificValidationArtifact(input: ScientificDraftResult):
30683086
finding: input.evidence_diagnostics.blocked_by_evidence_insufficiency ? "unverifiable" : "repairable",
30693087
message:
30703088
input.page_budget.warnings[0]
3071-
|| `Main-body length remains below the venue-aware ${input.page_budget.main_page_limit}-page target.`,
3089+
|| `Main-body length remains below the venue-aware ${input.page_budget.target_main_pages}-page target.`,
30723090
details: input.page_budget.warnings.slice(1),
30733091
thin_sections: input.evidence_diagnostics.thin_sections,
30743092
missing_evidence_categories: input.evidence_diagnostics.missing_evidence_categories,

src/core/doctor.ts

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ interface RunsFileSnapshot {
3737

3838
interface CompiledPageValidationSnapshot {
3939
status?: string;
40+
minimum_main_pages?: number;
41+
target_main_pages?: number;
4042
main_page_limit?: number;
4143
compiled_pdf_page_count?: number | null;
4244
message?: string;
@@ -234,13 +236,23 @@ async function loadLatestPaperPageBudgetCheck(workspaceRoot: string): Promise<Do
234236
const status = validation.status === "pass" ? "pass" : validation.status === "warn" ? "warn" : "fail";
235237
const pageCount =
236238
typeof validation.compiled_pdf_page_count === "number" ? String(validation.compiled_pdf_page_count) : "unknown";
237-
const limit = typeof validation.main_page_limit === "number" ? String(validation.main_page_limit) : "unknown";
239+
const minimumMainPages = typeof validation.minimum_main_pages === "number"
240+
? String(validation.minimum_main_pages)
241+
: typeof validation.main_page_limit === "number"
242+
? String(validation.main_page_limit)
243+
: "unknown";
244+
const targetMainPages = typeof validation.target_main_pages === "number"
245+
? String(validation.target_main_pages)
246+
: typeof validation.main_page_limit === "number"
247+
? String(validation.main_page_limit)
248+
: "unknown";
238249
return {
239250
name: "paper-page-budget",
240251
ok: status === "pass",
241252
detail:
242253
`Latest compiled paper page-budget check for run ${latestRun.id}: ${status}. ` +
243-
`pages=${pageCount}, main_page_limit=${limit}. ${validation.message || ""}`.trim()
254+
`pages=${pageCount}, minimum_main_pages=${minimumMainPages}, target_main_pages=${targetMainPages}. ` +
255+
`${validation.message || ""}`.trim()
244256
};
245257
}
246258

0 commit comments

Comments
 (0)