Skip to content
This repository was archived by the owner on Feb 26, 2026. It is now read-only.

implement the database layer for session scoring#243

Closed
metlos wants to merge 1 commit intomasterfrom
metlos/scoring-db-changes
Closed

implement the database layer for session scoring#243
metlos wants to merge 1 commit intomasterfrom
metlos/scoring-db-changes

Conversation

@metlos
Copy link
Collaborator

@metlos metlos commented Jan 28, 2026

This lays down the foundational types and db tables for the session scoring according to EP-0028.

Part of the PR stack:

Summary by CodeRabbit

  • New Features

    • Added session scoring capabilities with comprehensive status tracking (pending, in progress, completed, failed, timed out)
    • Introduced scoring API endpoints to create, retrieve, and manage session scores
    • Added detailed scoring analysis including performance metrics and missing tools identification
  • Tests

    • Expanded test coverage for scoring models and repository operations
  • Documentation

    • Completed Phase 1 scoring API design

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces Phase 1 of a session scoring API feature with SessionScore database model, Pydantic API response model, ScoringStatus enum, repository layer providing CRUD operations, Alembic database migration with indexes and constraints, and comprehensive unit tests.

Changes

Cohort / File(s) Summary
API Models
backend/tarsy/models/api_models.py
Adds SessionScoreResponse model with scoring result fields (score_id, session_id, prompt_hash, total_score, score_analysis, etc.); updates Field descriptions across existing models; introduces content validation for ChatMessageRequest with whitespace trimming.
Constants & Enums
backend/tarsy/models/constants.py
Adds ScoringStatus enum with members PENDING, IN_PROGRESS, COMPLETED, FAILED, TIMED_OUT; includes active_values() and values() classmethods.
Database Models
backend/tarsy/models/db_models.py
Adds SessionScore SQLModel with score tracking fields, foreign key to alert_sessions with CASCADE, multiple indexes (session_id, prompt_hash, total_score, status, combined), and partial unique index for active statuses; registers after_create event listener for index creation.
Repository Layer
backend/tarsy/repositories/session_score_repository.py
Introduces SessionScoreRepository with CRUD methods: create_session_score (with IntegrityError handling for duplicate active scores), get_score_by_id, get_scores_for_session, get_latest_score_for_session, get_active_score_for_session, and update_score_status for status transitions.
Database Migration
backend/alembic/versions/20260202_1721_776dca9d9a2f_add_session_scores_table.py
Alembic migration creating session_scores table with columns for scoring metadata, foreign key constraint with CASCADE delete, and six indexes including partial unique index on (session_id) for active statuses.
Unit Tests
backend/tests/unit/models/test_session_score_models.py, backend/tests/unit/repositories/test_session_score_repository.py
Comprehensive tests for ScoringStatus enum behavior, SessionScore model validation and UUID generation, API model conversion, repository CRUD operations, duplicate active score prevention, and status transition workflows.
Documentation
docs/enhancements/pending/EP-0028-phase-1-scoring-api.md
Marks all six Phase 1 scoring API items as completed: Alembic migration, SQLModel, Pydantic model, ScoringStatus enum, repository, and tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • tarsy-bot#72 — Introduces migration infrastructure and alembic/env.py setup that this PR depends on for database schema management
  • tarsy-bot#78 — Adds ON DELETE CASCADE foreign key constraints and session cleanup logic related to alert session data lifecycle and database schema changes
  • tarsy-bot#180 — Main PR directly implementing the Phase 1 scoring feature described in EP-0028

Suggested reviewers

  • alexeykazakov

Poem

🐰 A scoring system hops into place,
With enums and models at steady pace,
Repositories hold the scores so true,
Migrations cascade what the rabbits pursue! ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the primary change: implementing the database layer for session scoring, which is confirmed by the addition of SessionScore DB model, migration script, repository, and related API models.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into master

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch metlos/scoring-db-changes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/enhancements/pending/EP-0028-phase-1-scoring-api.md (1)

3-5: Update EP status/phase to match Phase 1 completion.

The header still says Status: Pending and Phase: Design, but Phase 1 tasks are checked off in the Implementation Plan. Please update these fields to reflect current implementation state (e.g., “In Progress/Implemented” and “Phase 1: Implementation”).
As per coding guidelines: "docs/enhancements/**/*.md: Check EP (Enhancement Proposal) documents for consistency with codebase implementation, validating ... documentation accuracy, and evolution tracking".

🤖 Fix all issues with AI agents
In `@backend/tarsy/models/db_models.py`:
- Around line 330-415: The class in db_models.py is declared as SessionScore but
the repository imports SessionScoreDB, causing an ImportError; fix by either
renaming the model class to SessionScoreDB or adding an alias assignment
(SessionScoreDB = SessionScore) after the class definition so the symbol
SessionScoreDB resolves, and if you rename ensure any event listeners or
references that expect SessionScoreDB are updated to the new name; target the
SessionScore class and the import in session_score_repository.py when making the
change.

In `@backend/tests/unit/models/test_session_score_models.py`:
- Around line 10-131: Add the pytest unit marker to these pure model tests by
decorating either the test classes TestScoringStatusEnum, TestSessionScoreDB,
and TestSessionScoreAPIModel or each test method (e.g., test_active_values,
test_create_session_score_with_required_fields, test_db_to_api_model_conversion,
test_current_prompt_used_defaults_to_false) with `@pytest.mark.unit`; do not add
or require fixtures like isolated_test_settings or test_database_session since
these tests only instantiate/validate models (SessionScore,
SessionScoreResponse, ScoringStatus) and perform no I/O.

In `@docs/enhancements/pending/EP-0028-phase-1-scoring-api.md`:
- Around line 288-301: The EP text must be made consistent with the repo: update
the doc to reference the actual SQLModel class SessionScore and the API model
SessionScoreResponse (not SessionScoreDB/SessionScore), and either add the
missing ScoringStatus.terminal_values() helper to the ScoringStatus enum
(returning [ScoringStatus.COMPLETED, ScoringStatus.FAILED]) or change the doc to
reference the existing active_values()/values() usage; likewise reconcile
current_prompt_used behavior by either implementing the hash comparison (store
prompt_hash on SessionScore, compute current hardcoded prompts hash at startup
and compare, defaulting to 0 in phase 1) or explicitly document that
current_prompt_used defaults to False in Phase 1. Ensure the doc describes the
implemented symbols SessionScore, SessionScoreResponse, ScoringStatus (with
terminal_values() if added), and the prompt_hash/current_prompt_used behavior
consistently.
🧹 Nitpick comments (4)
backend/tarsy/repositories/session_score_repository.py (2)

83-86: Consider optimizing get_latest_score_for_session with LIMIT 1.

Currently fetches all scores for a session just to return the first one. For sessions with many scoring attempts, this is inefficient.

♻️ Proposed optimization
 def get_latest_score_for_session(self, session_id: str) -> Optional[SessionScoreDB]:
     """Get the most recent scoring attempt for a session."""
-    scores = self.get_scores_for_session(session_id)
-    return scores[0] if scores else None
+    statement = (
+        select(SessionScoreDB)
+        .where(SessionScoreDB.session_id == session_id)
+        .order_by(SessionScoreDB.scored_at_us.desc())
+        .limit(1)
+    )
+    return self.session.exec(statement).first()

125-138: Inconsistent None-checking pattern.

total_score correctly uses if total_score is not None: but other optional fields use truthy checks (if completed_at_us:). While unlikely to cause issues with microsecond timestamps, using consistent patterns improves maintainability.

♻️ Use explicit None checks for consistency
-        if completed_at_us:
+        if completed_at_us is not None:
             score.completed_at_us = completed_at_us

-        if error_message:
+        if error_message is not None:
             score.error_message = error_message

         if total_score is not None:
             score.total_score = total_score

-        if score_analysis:
+        if score_analysis is not None:
             score.score_analysis = score_analysis

-        if missing_tools_analysis:
+        if missing_tools_analysis is not None:
             score.missing_tools_analysis = missing_tools_analysis
backend/tests/unit/repositories/test_session_score_repository.py (2)

4-4: Unused import: IntegrityError.

The IntegrityError import is not used in the tests since the repository catches it internally and raises ValueError.

♻️ Remove unused import
 import pytest
-from sqlalchemy.exc import IntegrityError
 from sqlmodel import Session, SQLModel, create_engine

13-14: Consider adding @pytest.mark.unit marker.

Per coding guidelines, unit tests should be marked with @pytest.mark.unit for proper test categorization.

♻️ Add test marker
+@pytest.mark.unit
 class TestSessionScoreRepository:
     """Test SessionScoreRepository CRUD operations."""

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 97.58065% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.78%. Comparing base (57f4ed4) to head (e57f753).

Files with missing lines Patch % Lines
...end/tarsy/repositories/session_score_repository.py 94.11% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #243      +/-   ##
==========================================
- Coverage   83.79%   83.78%   -0.02%     
==========================================
  Files         143      144       +1     
  Lines       13543    13639      +96     
==========================================
+ Hits        11349    11428      +79     
- Misses       2194     2211      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@backend/tarsy/models/db_models.py`:
- Around line 418-433: Update the event listener
create_session_scores_partial_index to add full type annotations (e.g., annotate
target as sqlalchemy.Table or TableClause, connection as
sqlalchemy.engine.Connection, kw as Mapping[str, Any], and return type as None),
rename unused parameters by prefixing them with an underscore (e.g., _target and
_kw) to silence Ruff, and update the top comment to refer to "SessionScore"
instead of "SessionScoreDB"; keep the listener attached to
SessionScore.__table__ and preserve the CREATE UNIQUE INDEX SQL as-is.

In `@backend/tarsy/repositories/session_score_repository.py`:
- Around line 103-140: In update_score_status, validate the incoming status
against ScoringStatus.values() before assigning to score.status (return/log and
abort if invalid), and change all truthy checks for optional fields to explicit
None checks so valid falsy values (e.g., 0 or empty string when intended) are
preserved: replace "if completed_at_us:", "if error_message:", "if
score_analysis:", "if missing_tools_analysis:" with "if ... is not None:" and
keep the existing calls to get_score_by_id and update_score to locate and
persist the SessionScore updates.
- Around line 24-33: Add explicit type annotations to the constructor: declare
return type `-> None`, annotate `self.session: Session` and `self.base_repo:
BaseRepository[SessionScore]` so static checkers recognize members; then add a
short comment/docstring in `__init__` clarifying sync vs async intent (reference
`EventRepository`/`AsyncSession`) — either state this repo is intentionally sync
(and should be run in a threadpool when used from async endpoints) or convert
the repository to use `AsyncSession` and async methods to match
`EventRepository`; update `BaseRepository` usage accordingly if switching to
async.
- Around line 34-60: The create_session_score in session_score_repository
currently treats all IntegrityError as duplicate-active errors; update it to
inspect the IntegrityError (e.g., e.orig.pgcode for Postgres (23505) and/or the
constraint name in e.orig or the error message for SQLite) and only translate to
the ValueError when the violation matches the partial unique-active constraint
(match the constraint name or unique violation code/message); for all other
integrity failures (FK violations, etc.) rollback and re-raise the original
IntegrityError so real data issues surface. Ensure you still call
self.session.rollback(), check the error details from the caught IntegrityError
before raising the ValueError for SessionScore.session_id duplicates, and
otherwise re-raise the IntegrityError from create_session_score (and document
the check near base_repo.create and create_session_score).
🧹 Nitpick comments (1)
backend/tarsy/models/db_models.py (1)

330-415: Confirm whether SessionScore should emit timeline/audit events.

If scoring records are meant to appear in the session audit trail, please wire this model into the existing timeline logging mechanism (consistent with other session-management models). If not, please confirm that omission is intentional.

Based on learnings: "Database models must use SQLModel-based implementation with timeline logging for session management and history tracking".

Copy link
Collaborator

@alexeykazakov alexeykazakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good!
I'm not sure if we want to add the timed out status now though (for the consistency sake with other agent/session states - and better error displaying).
But please do look at the coderabitai comments too!

@metlos metlos mentioned this pull request Jan 28, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/tarsy/repositories/session_score_repository.py`:
- Around line 63-75: In get_scores_for_session, don't treat falsy values as
absent and allow invalid statuses: change the guard to "if status is not None"
and validate the value against the ScoringStatus enum (e.g.,
ScoringStatus.values() or ScoringStatus.__members__) before applying the filter
on SessionScore.status; if the provided status is not in the allowed set raise a
ValueError (or a clear custom exception) so callers get immediate feedback
instead of silently ignoring bad input.
🧹 Nitpick comments (1)
backend/tarsy/repositories/session_score_repository.py (1)

80-83: Avoid loading all scores just to fetch the latest.

Line 82-83 pulls the entire list; using limit(1) avoids unnecessary I/O for long histories.

Suggested refactor
-        scores = self.get_scores_for_session(session_id)
-        return scores[0] if scores else None
+        statement = (
+            select(SessionScore)
+            .where(SessionScore.session_id == session_id)
+            .order_by(SessionScore.scored_at_us.desc())
+            .limit(1)
+        )
+        return self.session.exec(statement).first()

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@backend/tests/unit/repositories/test_session_score_repository.py`:
- Around line 13-33: Add the unit test marker and type hints: annotate the
TestSessionScoreRepository tests/fixtures with `@pytest.mark.unit` (apply to the
class or each test function) and add return type hints for the fixtures
in_memory_engine -> Engine, db_session -> Session, and repository ->
SessionScoreRepository so signatures for functions in TestSessionScoreRepository
(in_memory_engine, db_session, repository) include explicit return types while
keeping their current behavior.
- Around line 16-47: Replace the custom in_memory_engine and db_session fixtures
with the project's standard fixtures by removing in_memory_engine and db_session
and using the shared test_database_session (and include isolated_test_settings
where required) in the repository and sample_alert_session fixtures; update
repository to return SessionScoreRepository(test_database_session) and change
sample_alert_session to add and commit the AlertSession using
test_database_session so the tests use the standard
isolated_test_settings/test_database_session setup instead of a bespoke
in-memory engine.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dashboard/src/components/AlertListItem.tsx`:
- Around line 72-115: The summary Popover is only opened by mouse hover; add
keyboard and ARIA support by wiring focus/blur and keyboard handlers on the Chip
(add onFocus to call handleSummaryMouseEnter with the event target, onBlur to
call handleSummaryMouseLeave, and onKeyDown to open/close on Enter/Space) and
mark the Chip with aria-haspopup="dialog" and aria-controls pointing to the
Popover id; give the Popover an id (e.g., summary-popover-id) and keep it
controlled by summaryPopoverOpen/summaryAnchorEl and
onClose={handleSummaryMouseLeave} so keyboard users can open via focus/Enter and
Close via blur/Escape.
🧹 Nitpick comments (1)
backend/tests/unit/repositories/test_history_repository.py (1)

2758-2803: Consider moving this test to a more appropriate test class.

This test validates final_analysis_summary functionality but is placed in TestHistoryRepositoryChatMessageCount, which focuses on chat message count behavior. For better organization and discoverability, consider moving it to:

  • TestHistoryRepository class (where the related test_get_alert_sessions_includes_final_analysis_summary is located), or
  • A new dedicated test class like TestSessionOverviewFields

The test implementation itself is correct and well-written.

Comment on lines +72 to +115
<Chip
label="Summary"
size="small"
variant="outlined"
color="primary"
onMouseEnter={handleSummaryMouseEnter}
onMouseLeave={handleSummaryMouseLeave}
onClick={(e) => e.stopPropagation()} // Prevent row click
sx={(theme) => ({
cursor: 'pointer',
height: 24,
fontSize: '0.75rem',
fontWeight: 500,
transition: 'all 0.2s ease-in-out',
'&:hover': {
backgroundColor: `${theme.palette.grey[700]} !important`,
color: `${theme.palette.common.white} !important`,
borderColor: `${theme.palette.grey[700]} !important`,
},
})}
/>
<Popover
sx={{ pointerEvents: 'none' }}
open={summaryPopoverOpen}
anchorEl={summaryAnchorEl}
anchorOrigin={{ vertical: 'top', horizontal: 'left' }}
transformOrigin={{ vertical: 'bottom', horizontal: 'left' }}
onClose={handleSummaryMouseLeave}
disableRestoreFocus
>
<Card sx={{ maxWidth: 500, p: 2.5, boxShadow: 3 }}>
<Box sx={{ display: 'flex', alignItems: 'center', gap: 1, mb: 1.5 }}>
<Summarize color="primary" />
<Typography variant="subtitle1" sx={{ fontWeight: 600, color: 'primary.main' }}>
Executive Summary
</Typography>
</Box>
<Divider sx={{ mb: 1.5 }} />
<Box sx={executiveSummaryMarkdownStyles}>
<ReactMarkdown>
{session.final_analysis_summary}
</ReactMarkdown>
</Box>
</Card>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd -t f "AlertListItem.tsx" dashboard/

Repository: codeready-toolchain/tarsy-bot

Length of output: 115


🏁 Script executed:

wc -l dashboard/src/components/AlertListItem.tsx

Repository: codeready-toolchain/tarsy-bot

Length of output: 119


🏁 Script executed:

sed -n '1,100p' dashboard/src/components/AlertListItem.tsx | head -100

Repository: codeready-toolchain/tarsy-bot

Length of output: 4017


🏁 Script executed:

sed -n '100,150p' dashboard/src/components/AlertListItem.tsx

Repository: codeready-toolchain/tarsy-bot

Length of output: 1920


🏁 Script executed:

sed -n '150,227p' dashboard/src/components/AlertListItem.tsx

Repository: codeready-toolchain/tarsy-bot

Length of output: 2672


Add keyboard access to the summary popover trigger.

The summary popover only opens on mouse hover, preventing keyboard users from accessing it. Add focus/blur handlers to the Chip and wire ARIA attributes to the popover.

💡 Minimal accessibility fix
@@
-  const handleSummaryMouseEnter = (e: React.MouseEvent<HTMLElement>) => {
+  const handleSummaryMouseEnter = (e: React.MouseEvent<HTMLElement>) => {
     setSummaryAnchorEl(e.currentTarget);
   };
 
+  const handleSummaryFocus = (e: React.FocusEvent<HTMLElement>) => {
+    setSummaryAnchorEl(e.currentTarget);
+  };
@@
-  const summaryPopoverOpen = Boolean(summaryAnchorEl);
+  const summaryPopoverOpen = Boolean(summaryAnchorEl);
+  const summaryPopoverId = summaryPopoverOpen ? 'summary-popover' : undefined;
@@
               <Chip
                 label="Summary"
@@
                 onMouseEnter={handleSummaryMouseEnter}
                 onMouseLeave={handleSummaryMouseLeave}
+                onFocus={handleSummaryFocus}
+                onBlur={handleSummaryMouseLeave}
+                aria-describedby={summaryPopoverId}
                 onClick={(e) => e.stopPropagation()} // Prevent row click
@@
-              <Popover
+              <Popover
+                id={summaryPopoverId}
                 sx={{ pointerEvents: 'none' }}
🤖 Prompt for AI Agents
In `@dashboard/src/components/AlertListItem.tsx` around lines 72 - 115, The
summary Popover is only opened by mouse hover; add keyboard and ARIA support by
wiring focus/blur and keyboard handlers on the Chip (add onFocus to call
handleSummaryMouseEnter with the event target, onBlur to call
handleSummaryMouseLeave, and onKeyDown to open/close on Enter/Space) and mark
the Chip with aria-haspopup="dialog" and aria-controls pointing to the Popover
id; give the Popover an id (e.g., summary-popover-id) and keep it controlled by
summaryPopoverOpen/summaryAnchorEl and onClose={handleSummaryMouseLeave} so
keyboard users can open via focus/Enter and Close via blur/Escape.

@metlos metlos force-pushed the metlos/scoring-db-changes branch from f8830e7 to 4e28533 Compare February 13, 2026 14:15
@metlos
Copy link
Collaborator Author

metlos commented Feb 13, 2026

I rebased the logic on top of latest master. There were no changes but the number of merges was getting a bit ridiculous and the local history looked like mess.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/tarsy/models/api_models.py`:
- Around line 158-161: The status Field's description is missing the "timed_out"
variant; update the Field description for the status variable so it matches the
ScoringStatus enum (include timed_out alongside
pending|in_progress|completed|failed). Locate the status: str = Field(...)
declaration and amend the description string to list all enum values including
"timed_out" (using the exact enum token TIMED_OUT from ScoringStatus as
reference).
🧹 Nitpick comments (4)
backend/tarsy/models/constants.py (2)

187-190: active_values() returns enum instances, not strings — inconsistent with AlertSessionStatus.active_values().

AlertSessionStatus.active_values() (Line 46) returns List[str], but ScoringStatus.active_values() returns list[ScoringStatus] (enum instances). The method name active_values implies string values. This works in the repository's .in_() clause due to SQLAlchemy's enum handling, but it's a naming/contract inconsistency that will confuse callers expecting strings.

Consider either renaming to get_active_statuses() (matching AlertSessionStatus.get_active_statuses()) or returning list[str]:

Option A: Rename to match AlertSessionStatus pattern
     `@classmethod`
-    def active_values(cls) -> list['ScoringStatus']:
-        """Active status enum instances (scoring in progress)."""
+    def get_active_statuses(cls) -> list['ScoringStatus']:
+        """Statuses that indicate scoring is still in progress."""
         return [cls.PENDING, cls.IN_PROGRESS]
+
+    `@classmethod`
+    def active_values(cls) -> List[str]:
+        """Active status values as strings."""
+        return [status.value for status in cls.get_active_statuses()]

192-195: Minor: Mixed list (lowercase) and List (typing) annotation style.

Line 188 uses list['ScoringStatus'] while Line 193 uses List[str]. Both work in modern Python, but pick one style for consistency within the same class. AlertSessionStatus uses List throughout.

backend/alembic/versions/20260202_1721_776dca9d9a2f_add_session_scores_table.py (1)

82-86: Consider adding IF NOT EXISTS to the partial unique index creation.

The event listener in db_models.py (Line 437) uses CREATE UNIQUE INDEX IF NOT EXISTS, but this migration uses CREATE UNIQUE INDEX without it. While the table existence check on Line 34 protects against this during normal migration flow, adding IF NOT EXISTS would make it more resilient against edge cases (e.g., partial migration failures that created the index but not the table guard state).

Suggested change
         op.execute("""
-            CREATE UNIQUE INDEX ix_session_scores_unique_active
+            CREATE UNIQUE INDEX IF NOT EXISTS ix_session_scores_unique_active
             ON session_scores(session_id)
             WHERE status IN ('pending', 'in_progress')
         """)
backend/tarsy/models/db_models.py (1)

432-441: Prefix unused event listener parameters with _ to silence Ruff ARG001 warnings.

Ruff flags target and kw as unused (ARG001). These are required by SQLAlchemy's event listener signature but are not used in the function body.

Suggested fix
 `@event.listens_for`(SessionScore.__table__, "after_create")
-def create_session_scores_partial_index(target: Table, connection: Connection, **kw: Any) -> None:
+def create_session_scores_partial_index(_target: Table, connection: Connection, **_kw: Any) -> None:
     """Create partial unique index on session_scores after table creation."""

Comment on lines +158 to +161
# Async status fields
status: str = Field(
description="Scoring status (pending|in_progress|completed|failed)"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Status description omits timed_out which exists in ScoringStatus enum.

The field description lists (pending|in_progress|completed|failed) but ScoringStatus also defines TIMED_OUT = "timed_out". Update the description to include it:

     status: str = Field(
-        description="Scoring status (pending|in_progress|completed|failed)"
+        description="Scoring status (pending|in_progress|completed|failed|timed_out)"
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Async status fields
status: str = Field(
description="Scoring status (pending|in_progress|completed|failed)"
)
# Async status fields
status: str = Field(
description="Scoring status (pending|in_progress|completed|failed|timed_out)"
)
🤖 Prompt for AI Agents
In `@backend/tarsy/models/api_models.py` around lines 158 - 161, The status
Field's description is missing the "timed_out" variant; update the Field
description for the status variable so it matches the ScoringStatus enum
(include timed_out alongside pending|in_progress|completed|failed). Locate the
status: str = Field(...) declaration and amend the description string to list
all enum values including "timed_out" (using the exact enum token TIMED_OUT from
ScoringStatus as reference).

@metlos metlos force-pushed the metlos/scoring-db-changes branch from 4e28533 to e57f753 Compare February 13, 2026 15:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants