Skip to content

fix: account for skipped papers in novelty index validation#42

Open
ruizhengu wants to merge 1 commit intoAgentAlphaAGI:mainfrom
ruizhengu:main
Open

fix: account for skipped papers in novelty index validation#42
ruizhengu wants to merge 1 commit intoAgentAlphaAGI:mainfrom
ruizhengu:main

Conversation

@ruizhengu
Copy link
Copy Markdown

Description

This PR fixes the issue: #41

It fixes a bug where validate_novelty_index(/Idea2Paper/Paper-KG-Pipeline/src/idea2paper/infra/index_preflight.py) would fail and throw a RuntimeError if any papers were skipped during the novelty index build process.

Previously, the preflight check strictly required paper_count == index_count. However, the build_novelty_index.py script legitimately skips papers (e.g., when API generation fails or paper content is missing) and logs them under the skipped key in index_manifest.json(e.g., /Idea2Paper/Paper-KG-Pipeline/output/novelty_index__voyage-4/index_manifest.json).

This change updates the validation logic to check that index_count + skipped == paper_count, allowing the pipeline to proceed correctly even when some papers are skipped.

Changes Made

  • Updated validate_novelty_index in src/idea2paper/infra/index_preflight.py to calculate the expected count by adding index_count and skipped (defaulting to 0 if not present).
  • Replaced the strict equality check with expected == paper_count.

Steps to Test

  1. Run a novelty index build where some papers are skipped (e.g., mock a failure or use a dataset with empty text).
  2. The index_manifest.json(e.g., /Idea2Paper/Paper-KG-Pipeline/output/novelty_index__voyage-4/index_manifest.json) will report index_count < paper_count and a non-zero skipped value.
  3. Run python Paper-KG-Pipeline/scripts/idea2story_pipeline.py "test idea".
  4. Before: Pipeline crashes with RuntimeError: Novelty index build failed or incomplete.
  5. After: Preflight check passes successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant