Skip to content

Conversation

@morganizzzm
Copy link
Collaborator

This PR is the first working version of a commentary scoring system that uses an OpenAI model to automatically evaluate how well a Jewish commentary explains the base texts it cites.

🔍 Core Functionality
CommentaryScorer class:

  • Accepts commentary text and a dictionary of cited base texts.
  • Uses GPT-4o-mini via function-calling to assign a binary 0/1 explanation score to each citation.
  • Generates rationale for each score, always starting with "Explained spans".

Prompt Engineering:

  • Prompt instructs the model to assess interpretive depth, partial explanations, and inherited sources.
  • Model outputs structured JSON conforming to a predefined schema.

⚙️ Additional Components

  • text_utils.py: Utilities for HTML-stripping and recursive text flattening.
  • tasks.py: Celery task wrapper for async commentary scoring.
  • README.md: Contains algorithm details and package structure overview.

✅ Output Format
Standardized output includes:

  • Citation-wise binary score (ref_scores).
  • Explanation strings (scores_explanation).
  • Commentary identifier (commentary_ref).
  • ISO8601 timestamp (processed_datetime).
  • Request status and error message if applicable.

⚠️ Limitations
Does not yet support chunking for long commentaries.
Empirical testing suggests chunking may not be strictly necessary.

- Add sefaria_llm_interface/commentary_scoring package with input/output dataclasses
- Add commentary_scoring app with OpenAI-powered scoring functionality
- Implement CommentaryScorer class for evaluating how well commentaries explain cited texts
- Add Celery task integration for async commentary processing
- Include text processing utilities for HTML stripping and content flattening
- Update Celery autodiscovery to include commentary_scoring tasks
- added to CommentaryScoringOutput debugging fields: request_status and request_status_message
- updated CommentaryScorer, so it will return CommentaryScoringOutput instead of dictionary; this update also influences commentary_scoring.py

NOTE: by now importing from sefaria-llm-interface are local and not package-style, since the version with necessary files was not yet released
…ling (0/1)

- Replace 0–4 ExplanationLevel with binary ExplainsFlag {0: NOT_EXPLAINED, 1: EXPLAINED}
- Clamp/validate scores to 0/1 in _validate_level
- Update function-calling JSON schema to minimum: 0, maximum: 1 per cited key
- Rewrite prompt to policy:
-- Return 1 if the commentary provides any substantive interpretation of any part of the citation (incl. methodological/kabbalistic reads)
-- Return 0 if citation is decorative/prooftext/only paraphrased
-- If A is cited only via B and C adds no new interpretation of A beyond B → 0
-- Partial coverage still counts as 1
- Explanations: ask model to begin each rationale with Explained spans: '<phrase1>'; ... then 1–2 sentence justification (no schema change)
- Logging: report explained X/Y (Z%) instead of average 0–4

--
BREAKING BEHAVIOR: numeric scale semantics changed from graded (0–4) to binary (0/1).
- added README with explanation of the code
- removed unnecessary imports from commentary_scoring_input.py, commentary_scoring_output.py
- in openai_commentary_scorer.py changed the sefaria-llm-inteface importing from local folder importing to package importing; added comments to some functions; removed unnecessary spaces in functions definitions and added spaces after commas. same for text_utils.py
- added textwrap.dedent to prompt definition
- in tasks.py changed the sefaria-llm-inteface importing from local folder importing to package importing;
- updated commentary_scoring init from local to package imporing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant