Add commentary scoring #46

morganizzzm · 2025-08-19T13:26:11Z

This PR is the first working version of a commentary scoring system that uses an OpenAI model to automatically evaluate how well a Jewish commentary explains the base texts it cites.

🔍 Core Functionality
CommentaryScorer class:

Accepts commentary text and a dictionary of cited base texts.
Uses GPT-4o-mini via function-calling to assign a binary 0/1 explanation score to each citation.
Generates rationale for each score, always starting with "Explained spans".

Prompt Engineering:

Prompt instructs the model to assess interpretive depth, partial explanations, and inherited sources.
Model outputs structured JSON conforming to a predefined schema.

⚙️ Additional Components

text_utils.py: Utilities for HTML-stripping and recursive text flattening.
tasks.py: Celery task wrapper for async commentary scoring.
README.md: Contains algorithm details and package structure overview.

✅ Output Format
Standardized output includes:

Citation-wise binary score (ref_scores).
Explanation strings (scores_explanation).
Commentary identifier (commentary_ref).
ISO8601 timestamp (processed_datetime).
Request status and error message if applicable.

⚠️ Limitations
Does not yet support chunking for long commentaries.
Empirical testing suggests chunking may not be strictly necessary.

- Add sefaria_llm_interface/commentary_scoring package with input/output dataclasses - Add commentary_scoring app with OpenAI-powered scoring functionality - Implement CommentaryScorer class for evaluating how well commentaries explain cited texts - Add Celery task integration for async commentary processing - Include text processing utilities for HTML stripping and content flattening - Update Celery autodiscovery to include commentary_scoring tasks

…classes instead of commit_scoring

- added to CommentaryScoringOutput debugging fields: request_status and request_status_message - updated CommentaryScorer, so it will return CommentaryScoringOutput instead of dictionary; this update also influences commentary_scoring.py NOTE: by now importing from sefaria-llm-interface are local and not package-style, since the version with necessary files was not yet released

…ling (0/1) - Replace 0–4 ExplanationLevel with binary ExplainsFlag {0: NOT_EXPLAINED, 1: EXPLAINED} - Clamp/validate scores to 0/1 in _validate_level - Update function-calling JSON schema to minimum: 0, maximum: 1 per cited key - Rewrite prompt to policy: -- Return 1 if the commentary provides any substantive interpretation of any part of the citation (incl. methodological/kabbalistic reads) -- Return 0 if citation is decorative/prooftext/only paraphrased -- If A is cited only via B and C adds no new interpretation of A beyond B → 0 -- Partial coverage still counts as 1 - Explanations: ask model to begin each rationale with Explained spans: '<phrase1>'; ... then 1–2 sentence justification (no schema change) - Logging: report explained X/Y (Z%) instead of average 0–4 -- BREAKING BEHAVIOR: numeric scale semantics changed from graded (0–4) to binary (0/1).

…o package importing

- added README with explanation of the code - removed unnecessary imports from commentary_scoring_input.py, commentary_scoring_output.py - in openai_commentary_scorer.py changed the sefaria-llm-inteface importing from local folder importing to package importing; added comments to some functions; removed unnecessary spaces in functions definitions and added spaces after commas. same for text_utils.py - added textwrap.dedent to prompt definition - in tasks.py changed the sefaria-llm-inteface importing from local folder importing to package importing;

- updated commentary_scoring init from local to package imporing

morganizzzm added 9 commits August 10, 2025 11:25

feat: adding init to commentary_scoring package

e05f3d9

feat: added by mistake to previous commit sheet_scoring input/output …

973f6a4

…classes instead of commit_scoring

feat: changed the commentary_text type from List to str

6e4f34d

feat: changed sefaria-llm-interface importing from folder importing t…

cc189d4

…o package importing

style:

ab6d8f2

- updated commentary_scoring init from local to package imporing

morganizzzm requested a review from YishaiGlasner August 19, 2025 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add commentary scoring #46

Add commentary scoring #46

Uh oh!

morganizzzm commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add commentary scoring #46

Are you sure you want to change the base?

Add commentary scoring #46

Uh oh!

Conversation

morganizzzm commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant