Skip to content

Conversation

@morganizzzm
Copy link
Collaborator

  • Add SheetScorer class for scoring biblical reference discussion levels (0-4 scale), title interest, and creativity assessment with multi-language Hebrew/English support. Add resilient LLM processing with OpenAI function calling, recursive fallback handling, intelligent content chunking, and token management with smart truncation and LLM-based summarization for large sheets
  • Add Celery integration with SheetScoringInput/SheetScoringOutput dataclasses in sheet_scoring module and score_sheet_task
  • Register sheet_scoring package in Celery autodiscovery and add test script for validation

morganizzzm and others added 12 commits July 31, 2025 14:25
…-based scoring

- Add SheetScorer class for scoring biblical reference discussion levels (0-4 scale), title interest, and creativity assessment with multi-language Hebrew/English support. Add resilient LLM processing with OpenAI function calling, recursive fallback handling, intelligent content chunking, and token management with smart truncation and LLM-based summarization for large sheets
- Add Celery integration with SheetScoringInput/SheetScoringOutput dataclasses in scoring_io module and score_sheet_task
- Register sheet_scoring package in Celery autodiscovery and add test script for validation

This enables automated analysis of Jewish study sheets to score reference discussion
quality and title engagement, supporting content curation workflows.
…de review

- Replace manual multiline f-strings with textwrap.dedent for prompt builders
- Simplify _invoke_llm_with_function by using getattr on additional_kwargs
- Rename SheetScoringOutput.processed_at → processed_datetime
- Move datetime→ISO conversion into __post_init__ instead of custom __init__
- Expand and clarify comments on configuration constants
- Switch from relative to absolute imports after adding LLM to PYTHONPATH
feat: Add SheetScorer tool for analyzing Jewish study sheets with LLM…
…nsistent with topic_prompt and other llm apps

scoring_io_input.py -> sheet_scoring_input.py
scoring_io_output.py -> sheet_scoring_output.py
package name scoring_io -> sheet_scoring

consequently renamed all the inputs
added this by mistake in one of the commits
@morganizzzm morganizzzm requested a review from nsantacruz August 11, 2025 11:13
return ""
text = TAG_RE.sub("", raw)
text = html.unescape(text)
text = re.sub(r"\s+\n", "\n", text) # trim spaces before newlines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't need tabs, i think this is better:

return '\n'.join([' '.join(line.split()) for line in text.split('\n')])

i think it should be faster, but maybe i'm wrong, and maybe it's not so readable (for me it seems readable but i'm not sure for everyone)

return len(TOKEN_RE.findall(text))


def sheet_to_text_views(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not too long for being one line (we usually don't break def like this)

def sheet_to_text_views(
sheet: Dict[str, Any],
default_lang: str = "en",
) -> Tuple[str, str, str, bool, float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning a dict is better. it make it much clearer what you have in hand, so if there is no reason for that, please change.
it can also be a class if you wish.

default_lang: str = "en",
) -> Tuple[str, str, str, bool, float]:
"""
Build three plain‑text snapshots of a Sefaria sheet **and** compute a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why both? can you split it into 2 functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because all of the outputs are extracted during the same parsing process of sheet's attributes. if i divide it into 2 functions it will take x2 time

token_margin: int = DEFAULT_MAX_OUTPUT_TOKENS,
max_ref_to_process: int = DEFAULT_MAX_REFS_TO_PROCESS,
chunk_size: int = DEFAULT_CHUNK_SIZE,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove newline

self.LANGUAGE_FIELD: LanguageCode.DEFAULT
}

def _normalize_scores_to_percentages(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc string to explain what this functions does

• First try the whole list.
• If the model returns < len(refs) scores (or JSON error),
split the list in two and grade each half recursively.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't understand what this function should do and specifically why it returns 2 dicts

validated_scores = {}
for ref, score in chunk_scores.items():
validated_scores[ref] = self._validate_score_level(
score,f"ref_score[{ref}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

logger.error(f"Chunk GPT failed: {e}")
return None

def _last_regular_start(self, n: int, chunk: int, overlap: int) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change n to have a deescriptive name?

morganizzzm and others added 13 commits August 13, 2025 14:33
…nd unified output

- Updated SheetScoringInput to have only the data used by SheetScorer: sheet_id, title, sources, expanded_refs
- refactored SheetScorer to return SheetScoringOutput directly instead of a dict
- Changed score_one_sheet to call process_sheet_by_content with explicit args (title, sources etc.)
- Added request_status and request_status_message fields to SheetScoringOutput
- Modified sheet_to_text_views to accept title/sources and return dict with quotes_only, no_quotes, with_quotes, has_original, creativity_score
- Simplified strip_html to only collapse whitespace/newlines, removed HTML entity decoding
- Added create_failure_output for standardized error handling
- updated README
- deleted from openai_sheets_scorer.py unused field SHEET_ID_FIELD
- changed the explanation commentary of id field in SheetScoringInput
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive sheet scoring system that uses LLMs to automatically analyze Jewish study sheets. The system scores biblical reference discussion levels (0-4 scale), evaluates title interest, and calculates creativity metrics based on user-generated content percentage. The implementation includes resilient LLM processing with OpenAI function calling, intelligent chunking for large sheets, and Celery integration for asynchronous task processing.

Key Changes:

  • Implements SheetScorer class with multi-language support (Hebrew/English) for analyzing study sheets via OpenAI GPT models
  • Adds Celery task integration with score_sheet_task and structured input/output dataclasses
  • Reduces Celery worker concurrency from 50 to 4 to accommodate resource-intensive LLM operations
  • Fixes typo in sentencizer prompt ("will" → "with")

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
build/entrypoint.sh Reduces Celery concurrency from 50 to 4 for LLM workloads
app/util/sentencizer.py Fixes typo in system prompt ("will" → "with")
app/sheet_scoring/text_utils.py Adds text processing utilities for token counting and sheet content extraction
app/sheet_scoring/tasks.py Implements Celery task wrapper for sheet scoring
app/sheet_scoring/sheet_scoring.py Provides main API entry point for scoring sheets
app/sheet_scoring/openai_sheets_scorer.py Core LLM scoring engine with resilient processing and chunking strategies
app/sheet_scoring/init.py Package initialization file
app/sheet_scoring/README.md Comprehensive documentation for the sheet scoring system
app/requirements.txt Updates dependencies: pins langchain versions, adds langchain-openai, httpx, and unpins tiktoken
app/llm_interface/sefaria_llm_interface/sheet_scoring/sheet_scoring_output.py Defines output dataclass for sheet scoring results
app/llm_interface/sefaria_llm_interface/sheet_scoring/sheet_scoring_input.py Defines input dataclass for sheet scoring requests
app/llm_interface/sefaria_llm_interface/sheet_scoring/init.py Exports sheet scoring dataclasses
app/celery_setup/app.py Registers sheet_scoring package in Celery autodiscovery

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +3
langchain==0.2.1
langchain-core==0.2.2
langchain-openai==0.1.8
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency constraint change: Changed from flexible version langchain[llms]~=0.2.1 to exact version langchain==0.2.1 and split into separate packages (langchain-core, langchain-openai). While this provides more control, ensure this exact version pinning doesn't cause conflicts with other dependencies. The split into specific packages is good practice for reducing unnecessary dependencies.

Suggested change
langchain==0.2.1
langchain-core==0.2.2
langchain-openai==0.1.8
langchain~=0.2.1
langchain-core~=0.2.2
langchain-openai~=0.1.8

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making versions work took me a day...

Comment on lines 109 to 132
def _create_json_llm(self, api_key: str, model: str) -> ChatOpenAI:
"""Create LLM client for JSON responses."""
http_client = httpx.Client()
return ChatOpenAI(
model=model,
temperature=0,
top_p=0,
frequency_penalty=0,
presence_penalty=0,
seed=42,
api_key=api_key,
http_client=http_client,
)

def _create_text_llm(self, api_key: str, model: str) -> ChatOpenAI:
"""Create LLM client for text responses."""
http_client = httpx.Client()
return ChatOpenAI(
model=model,
temperature=0,
model_kwargs={"response_format": {"type": "text"}},
api_key=api_key,
http_client=http_client,
)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: httpx.Client() instances are created but never closed. These HTTP clients should be properly closed to avoid resource leaks. Consider using context managers or storing the client as an instance variable with proper cleanup in a destructor, or initialize ChatOpenAI without passing an http_client parameter to let it manage its own client lifecycle.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
from sefaria_llm_interface.sheet_scoring.sheet_scoring_input import *
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pollutes the enclosing namespace, as the imported module sefaria_llm_interface.sheet_scoring.sheet_scoring_input does not define 'all'.

Suggested change
from sefaria_llm_interface.sheet_scoring.sheet_scoring_input import *
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *
import sefaria_llm_interface.sheet_scoring.sheet_scoring_input as sheet_scoring_input
import sefaria_llm_interface.sheet_scoring.sheet_scoring_output as sheet_scoring_output
__all__ = []
# Re-export all public (non-underscore) names from the submodules
for _mod in (sheet_scoring_input, sheet_scoring_output):
for _name in dir(_mod):
if not _name.startswith("_"):
globals()[_name] = getattr(_mod, _name)
__all__.append(_name)

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +3
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pollutes the enclosing namespace, as the imported module sefaria_llm_interface.sheet_scoring.sheet_scoring_output does not define 'all'.

Suggested change
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *
import sefaria_llm_interface.sheet_scoring.sheet_scoring_output as _sheet_scoring_output
# Re-export all public names from sheet_scoring_output without using a wildcard import.
for _name, _value in _sheet_scoring_output.__dict__.items():
if not _name.startswith("_"):
globals()[_name] = _value

Copilot uses AI. Check for mistakes.
diff-match-patch
dnspython~=2.5.0
tiktoken~=0.4.0
tiktoken
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiktoken dependency is now specified without a version, which means builds and runtime will pull the latest mutable release and any compromised or malicious update could execute arbitrary code with the application's privileges (including access to secrets and data). This creates a concrete supply chain risk because an attacker who gains control of the PyPI project or its release process can silently introduce a backdoored version that will be automatically deployed. Pin tiktoken to a specific, vetted version (and ideally manage upgrades explicitly) to ensure only known-good code is installed.

Copilot uses AI. Check for mistakes.
YishaiGlasner and others added 12 commits January 8, 2026 10:01
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@saengel
Copy link

saengel commented Jan 8, 2026

@YishaiGlasner still some conflicts that need to be resolved. Also, can you give a bit of context when you have a chance + anything specific you want us to be looking for? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants