Skip to content

Feat: Add Scoring and Verdict#15

Merged
mahimairaja merged 4 commits intomainfrom
feat/sync-back-frontend
Feb 15, 2026
Merged

Feat: Add Scoring and Verdict#15
mahimairaja merged 4 commits intomainfrom
feat/sync-back-frontend

Conversation

@mahimairaja
Copy link
Copy Markdown
Collaborator

@mahimairaja mahimairaja commented Feb 15, 2026

Summary by CodeRabbit

  • New Features

    • Full interview scoring and verdicts with raw/final scores, emotion modifiers, feedback, and per-question breakdowns; UI screen to view detailed verdicts and score bars; session history shows final score and "View details".
  • Bug Fixes

    • Improved session shutdown and disconnection handling; refined session-end messaging.
  • Chores

    • Periodic token refresh added for more reliable auth/session sync.
  • Tests

    • Extensive unit, integration, and end-to-end test suites added for scoring, sessions, and conversation flows.

@mahimairaja mahimairaja self-assigned this Feb 15, 2026
@mahimairaja mahimairaja added the enhancement New feature or request label Feb 15, 2026
@vercel
Copy link
Copy Markdown

vercel bot commented Feb 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
waterloo-ai-agents-hackathon2026 Ready Ready Preview, Comment Feb 15, 2026 2:26am

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 15, 2026

📝 Walkthrough

Walkthrough

Adds a Claude-based scoring pipeline: DB schema and models extended for rich scoring metadata, a scoring service and prompt builder, ARQ worker tasks to enqueue/run scoring, API and frontend surfaces to expose verdicts, and extensive tests and fixtures to validate scoring and flows.

Changes

Cohort / File(s) Summary
Database schema & models
backend/migrations/versions/bdd57d9128c7_add_scoring.py, backend/src/models/score_model.py, backend/src/models/session_model.py
Added score columns (raw_score, final_score, emotion_modifier, per_question_scores, feedback fields, Claude metadata) and session verdict columns (has_verdict, verdict_status).
Scoring service & prompt builder
backend/src/services/scoring/scoring_service.py, backend/src/services/scoring/prompt_builder.py
New Claude-based ScoringService and prompt construction helpers; JSON parsing, clamping, weighted scoring, verdict determination, and metadata extraction. High logic density — review parsing and clamping, API key handling, and JSON sanitization.
Worker & enqueue flow
backend/workers/main.py, backend/src/workers/tasks.py
Added score_session_task, worker wiring, ARQ enqueue via Redis pool, idempotent persistence and session status updates; ensure task retry/exception handling and resource cleanup.
API endpoints & repositories
backend/src/api/v1/endpoints/sessions.py, backend/src/api/v1/endpoints/suitors.py, backend/src/repository/score_repository.py, backend/src/schemas/session_schema.py
Status/verdict endpoint logic updated to use session verdict fields and optionally include expanded score fields; score repository gains create and exists_for_session; response schemas extended. Attention: changed endpoint signature for get_session_status.
Agent & session lifecycle
backend/agent/interview_agent.py, backend/agent/main.py
Conditional session end in record_suitor_response (only call session_mgr.end if no end_reason); enhanced disconnect cleanup with on_close and aclose/close handling and error logging.
Configuration & deps
backend/src/core/config.py, backend/pyproject.toml
Added VERDICT_THRESHOLD config (default 65.0) and added anthropic>=0.74.1 dependency.
Frontend: pages, routing & models
frontend/src/pages/InterviewCompleteScreen.tsx, frontend/src/pages/ChatScreen.tsx, frontend/src/pages/DatesGrid.tsx, frontend/src/App.tsx, frontend/src/api/model/*, frontend/src/api/model/sessionVerdictResponsePerQuestionScores.ts
New InterviewCompleteScreen and route, navigation change on ChatScreen to /interview/:sessionId/complete, DatesGrid shows final_score and detail links; TypeScript models extended with new scoring fields and per-question type.
Frontend: auth & sync
frontend/src/hooks/useAuthSync.ts
Refactored token sync to async/await, added 30s periodic refresh with proper cleanup.
OpenAPI / API contract
frontend/openapi.json, shared/openapi.json
OpenAPI schemas updated to include final_score, raw_score, emotion_modifier_reasons, feedback_strengths/improvements, per_question_scores across SessionSummary and SessionVerdictResponse.
Tests & fixtures
backend/src/tests/..., backend/src/tests/fixtures/*, backend/src/tests/fixtures/mock_responses/*
Large additions: unit/e2e/integration tests, extensive fixtures and mock responses for Claude, Hume, Calcom; review for test coverage and mocking realism. Critical review due to volume.

Sequence Diagram

sequenceDiagram
    participant Frontend
    participant API
    participant Queue as ARQ_Queue
    participant Worker
    participant Claude
    participant DB as Database

    Frontend->>API: End interview session (request)
    API->>DB: Persist session end, mark for scoring
    API->>Queue: enqueue_scoring_job(session_id, defer=5s)
    API-->>Frontend: Acknowledge

    Queue->>Worker: Dequeue score_session_task(session_id)
    Worker->>DB: Load session, heart_config, transcript, turns
    Worker->>Worker: build_scoring_prompt(...)
    Worker->>Claude: send prompt (Anthropic/Claude)
    Claude-->>Worker: return JSON/text response
    Worker->>Worker: parse JSON, clamp scores, apply emotion_modifier, compute final_score
    Worker->>DB: create Score record, update session.has_verdict/verdict_status
    Frontend->>API: Poll status / fetch verdict
    API->>DB: read has_verdict / score
    API-->>Frontend: SessionVerdictResponse (scores, feedback, per_question_scores)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • PR #7: Alters the same agent/session shutdown code (backend/agent/interview_agent.py, backend/agent/main.py) — direct overlap in disconnect/end-session behavior.
  • PR #14: Modifies frontend auth/token sync and routing (useAuthSync.ts, App.tsx, ChatScreen.tsx) — overlaps in auth sync and navigation changes.
  • PR #6: Introduces or modifies frontend routing and app scaffolding — related to the added InterviewCompleteScreen route and frontend wiring.

Poem

🐰 With twitching nose and tiny hop I cheer,

Claude reads the heart, the answers near.
Scores and warmth in gentle rhyme,
Turn by turn, we count the time.
Hop, hum, and nibble—results appear!

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (23 files):

⚔️ backend/agent/interview_agent.py (content)
⚔️ backend/agent/main.py (content)
⚔️ backend/pyproject.toml (content)
⚔️ backend/src/api/v1/endpoints/sessions.py (content)
⚔️ backend/src/api/v1/endpoints/suitors.py (content)
⚔️ backend/src/core/config.py (content)
⚔️ backend/src/models/score_model.py (content)
⚔️ backend/src/models/session_model.py (content)
⚔️ backend/src/repository/score_repository.py (content)
⚔️ backend/src/schemas/session_schema.py (content)
⚔️ backend/src/tests/conftest.py (content)
⚔️ backend/src/workers/tasks.py (content)
⚔️ backend/uv.lock (content)
⚔️ backend/workers/main.py (content)
⚔️ frontend/openapi.json (content)
⚔️ frontend/src/App.tsx (content)
⚔️ frontend/src/api/model/index.ts (content)
⚔️ frontend/src/api/model/sessionSummary.ts (content)
⚔️ frontend/src/api/model/sessionVerdictResponse.ts (content)
⚔️ frontend/src/hooks/useAuthSync.ts (content)
⚔️ frontend/src/pages/ChatScreen.tsx (content)
⚔️ frontend/src/pages/DatesGrid.tsx (content)
⚔️ shared/openapi.json (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat: Add Scoring and Verdict' accurately summarizes the main change, which introduces comprehensive scoring and verdict functionality including Claude-based scoring service, prompt builders, database migrations, API endpoints, and frontend UI.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/sync-back-frontend
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feat/sync-back-frontend
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@backend/agent/interview_agent.py`:
- Around line 62-66: The current logic returns "All questions answered..." even
when the session already has an end_reason, which can be misleading; in the
method containing this snippet (look for the block using self.session_mgr.end
and the remaining variable in interview_agent.py), change the branch so that
when remaining == 0 you only call self.session_mgr.end("all_questions_complete")
if end_reason is None, and then return a message that reflects the actual
end_reason when it is already set (e.g., include or reference
self.session_mgr.end_reason in the returned string or return a neutral message
like "Session already ended: {self.session_mgr.end_reason}"). Ensure you do not
overwrite an existing end_reason.

In `@backend/agent/main.py`:
- Around line 304-318: The cleanup block that catches Exception around calling
on_close and closing the session (references: session_mgr.end_reason, on_close,
session.aclose/close, session_id, logger.warning) can swallow
asyncio.CancelledError; change the except to explicitly re-raise cancellations
by catching asyncio.CancelledError (or BaseException subclass) and re-raising
it, while catching other exceptions to log the failure with traceback via
logger.exception or logger.warning including exc info; ensure any coroutine
result from session.close() is awaited as before and that cancellations
propagate instead of being suppressed.

In `@backend/src/api/v1/endpoints/sessions.py`:
- Around line 203-206: The fallback for verdict_status uses
session.verdict_status and session.has_verdict but doesn't account for legacy
rows that have a score but has_verdict=False, causing endpoints to return
"pending" (202); update the verdict_status computation to check for a score
existence before defaulting (e.g., set verdict_status = session.verdict_status
or ("ready" if a score exists on session else "pending")), using the existing
session object and keeping session.has_verdict and session.verdict_status logic
intact; apply the same fix to the other verdict-status computation block that
mirrors this logic.

In `@backend/src/services/scoring/prompt_builder.py`:
- Around line 49-53: The direct int() cast on turn.get("question_index", 0) can
raise ValueError/TypeError for corrupted JSONB (None or non-numeric string);
change the logic in the loop that produces q_idx (the code iterating over
turn_summaries and reading "question_index") to defensively coerce to string or
catch conversion errors—e.g., obtain the raw value from
turn.get("question_index", 0), attempt to convert to int inside a try/except
handling ValueError/TypeError and fall back to 0 (so q_idx = safe_int + 1); keep
the rest of the field parsing (q_text, quality, summary) unchanged to match the
existing defensive str() pattern.

In `@backend/src/services/scoring/scoring_service.py`:
- Around line 145-149: The JSON parsing except block around json.loads(text)
should stop logging the raw Claude response (text) to avoid PII leakage;
instead, in the except json.JSONDecodeError as exc handler (the block that
currently calls logger.error("Claude returned invalid JSON: %s", text[:600])),
replace that with logger.exception(...) to capture the stack trace and log only
metadata about the response (e.g., length = len(text) and a short redacted
preview if desired), then re-raise the ValueError from exc as before; update the
handler that surrounds json.loads(text) to use logger.exception and metadata
rather than printing the full text.

In `@backend/workers/main.py`:
- Around line 61-142: The race causes a duplicate score create to be treated as
a hard failure; modify the scoring flow so that when score_repo.create(...)
fails due to a duplicate/unique constraint, you treat it as idempotent success
instead of falling through to the generic exception handler: specifically, catch
the duplicate/IntegrityError (or create a ScoreExistsError in score_repo) around
score_repo.create, and in that handler re-check
score_repo.exists_for_session(session_uuid) and if true set
session_repo.update_attr(..., "has_verdict", True),
session_repo.update_attr(..., "verdict_status", "ready") and
session_repo.update_status(..., SessionStatus.SCORED) (do not set FAILED); keep
the existing broad except Exception path for other errors which should mark
FAILED.

In `@frontend/src/pages/InterviewCompleteScreen.tsx`:
- Around line 79-151: The loading gate should include verdictQuery.isLoading to
avoid flicker when verdict fetching starts; update the initial conditional that
currently checks statusQuery.isLoading || waiting to also check
verdictQuery.isLoading (i.e., change the guard in the early return that renders
"Analyzing your interview..." to statusQuery.isLoading || waiting ||
verdictQuery.isLoading) so the loading screen remains until verdictQuery
finishes.
🧹 Nitpick comments (1)
backend/src/api/v1/endpoints/suitors.py (1)

74-95: Consider batch-fetching scores to avoid N+1 queries.

The current implementation fetches each score individually within the loop (line 75), resulting in up to 20 additional database queries. While acceptable for current scale, consider batch-fetching scores for all session IDs in a single query if this endpoint becomes a performance bottleneck.

# Example optimization approach:
session_ids = [s.id for s in sessions]
scores_by_session = await score_repo.find_by_session_ids(session_ids)  # batch fetch

The final_score field population on line 93 follows the established pattern correctly.

Comment thread backend/agent/interview_agent.py
Comment thread backend/agent/main.py Outdated
Comment thread backend/src/api/v1/endpoints/sessions.py
Comment thread backend/src/services/scoring/prompt_builder.py
Comment thread backend/src/services/scoring/scoring_service.py
Comment thread backend/workers/main.py
Comment thread frontend/src/pages/InterviewCompleteScreen.tsx Outdated
@mahimairaja mahimairaja merged commit b5803a8 into main Feb 15, 2026
5 of 6 checks passed
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@backend/src/services/scoring/prompt_builder.py`:
- Around line 14-30: The function format_expectations assumes expectations is a
dict and calls .get, which will raise for list/str legacy inputs; add a type
guard at the top of format_expectations to verify isinstance(expectations, dict)
(or Mapping) and if not, return "No specific expectations defined." (or coerce
safe default) so the subsequent accesses to expectations.get and indexing are
safe; update references to expectations, looking_for, values, must_haves, and
communication_preferences inside format_expectations accordingly.

In `@backend/src/tests/fixtures/heart_config_missing_name.yaml`:
- Around line 1-14: The test fixture has an unrelated validation failure because
calcom_event_type_id is a string; update the fixture so all non-target fields
are valid while keeping the missing-name condition under test—specifically
change calcom_event_type_id to a valid numeric ID (or appropriately-typed value)
and ensure other fields like calcom_api_key, persona, screening_questions, and
shareable_slug remain valid so the test isolates the missing "name" error.

In `@backend/src/tests/test_api_contract.py`:
- Around line 20-24: The test test_api_003_not_found_returns_404 currently
allows a 500 which masks regressions; update the assertion so the request to
"/api/v1/does-not-exist" must return a hard 404 (e.g., assert resp.status_code
== 404) and remove the {404, 500} set; keep the existing client.get call and
test name unchanged so it's clear this is strictly a not-found behavior check.

In `@backend/src/tests/test_e2e_flows.py`:
- Around line 104-129: The test test_e2e_003_livekit_room_creation_fails is
currently catching the broad Exception; change the assertion to expect the
specific exception raised by the mocked livekit (RuntimeError) so the test only
fails for the intended livekit-room-creation failure—replace
pytest.raises(Exception) with pytest.raises(RuntimeError) (referencing the test
function name and the livekit.create_room.side_effect set to
RuntimeError("livekit down") and the call to start_session.__wrapped__).

In `@backend/src/tests/test_m4_suitor_entry.py`:
- Around line 56-68: The test name and assertion are inconsistent with intended
XSS handling: test_m4_005_register_suitor_xss_sanitized expects sanitization but
asserts the raw "<script>" is preserved; complete_suitor_profile (in
backend/src/api/v1/endpoints/suitors.py) only calls .strip() on name. Fix by
either (A) implementing HTML/script escaping inside complete_suitor_profile
before calling repo.update_by_clerk_id (apply a sanitizer/escape to the
payload.name so stored value removes or encodes script tags), or (B) if you
intentionally allow passthrough, rename the test to
test_m4_005_register_suitor_xss_passthrough and update its docstring/comment to
document this known gap; update the test assertion or implementation accordingly
to keep names and behavior consistent (refer to complete_suitor_profile and
test_m4_005_register_suitor_xss_sanitized when making the change).

In `@frontend/src/pages/InterviewCompleteScreen.tsx`:
- Around line 132-150: The current render lumps verdictQuery.isError and
!verdict together; update InterviewCompleteScreen's render logic to branch so
that if verdictQuery.isError you render a distinct error state (inside the
Window titled "Verdict.exe" or an ErrorWindow) that displays
verdictQuery.error.message and offers a retry action (call verdictQuery.refetch
on a "Retry" button) and/or a link back to sessions, while keeping the existing
"Verdict is not available yet." UI only for the case where !verdict &&
!verdictQuery.isError; ensure you reference verdictQuery, verdictQuery.error,
verdictQuery.refetch and the existing Window/"Verdict.exe" rendering when making
the change.
- Around line 38-69: In InterviewCompleteScreen, add an explicit guard for a
missing sessionId after the hooks: if sessionId is falsy, either navigate away
(using navigate) or set a dedicated "missing ID" state and return a
short-circuit UI so the component doesn't rely on statusQuery/verdictQuery and
avoids the perpetual waiting state; reference sessionId, statusQuery,
verdictQuery and waiting when implementing the early return/redirect.
🧹 Nitpick comments (19)
backend/agent/main.py (1)

316-323: CancelledError handling looks good; minor cleanup for logging.

The explicit asyncio.CancelledError catch-and-reraise (lines 316-317) properly preserves cooperative shutdown semantics—this addresses the previous review concern.

However, the exc argument in logger.exception() is redundant since this method automatically includes the full exception traceback.

♻️ Proposed fix
         except asyncio.CancelledError:
             raise
         except Exception as exc:
             logger.exception(
-                "Failed to finalize/close agent session %s cleanly: %s",
-                session_id,
-                exc,
+                "Failed to finalize/close agent session %s cleanly",
+                session_id,
             )
backend/src/tests/conftest.py (1)

172-197: Align heart_config expectations shape with production schema.
The scoring prompt expects a dict-like expectations structure; a list here can drift from production and cause type errors if reused.

♻️ Suggested update
-        "expectations": ["Sense of humor", "Emotional depth", "Creativity"],
+        "expectations": {
+            "values": ["Sense of humor", "Emotional depth", "Creativity"]
+        },
backend/src/tests/test_api_contract.py (1)

32-36: Make the SQL injection test hit a real endpoint.
Current assertions are tautological and don’t exercise any code path; consider sending the payload to a public API and asserting proper validation/handling.

backend/src/services/scoring/scoring_service.py (1)

29-36: Move the Anthropic model identifier to configuration.

The model identifier "claude-sonnet-4-20250514" is valid and supported in anthropic SDK 0.74.1. However, it should be moved to config.py following the existing pattern used for other model identifiers (e.g., DEEPGRAM_TTS_MODEL, SMALLEST_LLM_MODEL). This allows environment-specific model updates without code changes and keeps the hardcoded value in ScoringService.__init__, database schema, and migrations in sync.

backend/src/tests/test_m4_suitor_entry.py (2)

34-36: Catch specific exception type instead of blind Exception.

Using pytest.raises(Exception) is too broad and may mask unrelated failures. Since SuitorRegisterRequest uses Pydantic validation, catch pydantic.ValidationError for precise test assertions.

Proposed fix
+from pydantic import ValidationError
+
 `@pytest.mark.asyncio`
 async def test_m4_002_register_suitor_name_required():
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         SuitorRegisterRequest(name="", age=20, gender="x", orientation="y")

50-53: Catch specific exception type instead of blind Exception.

Same issue as above - use ValidationError for Pydantic schema validation tests.

Proposed fix
 `@pytest.mark.asyncio`
 async def test_m4_004_register_suitor_long_name_rejected():
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         SuitorRegisterRequest(name="a" * 101, age=22, gender="f", orientation="x")
backend/src/tests/test_m1_foundation.py (1)

162-168: Remove unused monkeypatch parameter.

The monkeypatch fixture is declared but never used in this test.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m1_013_arq_worker_connects_to_redis(monkeypatch):
+async def test_m1_013_arq_worker_connects_to_redis():
     """Worker settings should expose redis DSN."""
     assert WorkerSettings.redis_settings is not None
backend/src/tests/test_m5_scoring.py (3)

63-65: Remove unused monkeypatch parameter.

This test only verifies score_session_task is callable and doesn't use monkeypatch.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m5_002_scoring_worker_picks_up_job(monkeypatch):
+async def test_m5_002_scoring_worker_picks_up_job():
     assert callable(score_session_task)

283-304: Catch specific HTTPException instead of blind Exception.

These tests verify API error responses; catching HTTPException with status code assertions provides better test precision.

Proposed fix
+from fastapi import HTTPException
+
 `@pytest.mark.asyncio`
 async def test_m5_024_get_verdict_api_not_ready(registered_suitor, completed_session):
     session_repo = AsyncMock()
     session_repo.read_by_id.return_value = completed_session
     score_repo = AsyncMock()
     score_repo.find_by_session_id.return_value = None
     completed_session.verdict_status = "scoring"
-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await get_session_verdict.__wrapped__(
             completed_session.id, registered_suitor, session_repo, score_repo
         )
+    assert exc_info.value.status_code == 202


 `@pytest.mark.asyncio`
 async def test_m5_025_get_verdict_api_session_not_found(registered_suitor):
     session_repo = AsyncMock()
     session_repo.read_by_id.return_value = None
     score_repo = AsyncMock()
-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await get_session_verdict.__wrapped__(
             uuid.uuid4(), registered_suitor, session_repo, score_repo
         )
+    assert exc_info.value.status_code == 404

340-348: Remove unused monkeypatch and improve retry test coverage.

The monkeypatch parameter is unused. Additionally, this test only verifies that the mock raises an error when called directly—it doesn't actually test retry behavior of the ScoringService.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m5_027_scoring_retries_on_claude_api_failure(monkeypatch):
+async def test_m5_027_scoring_retries_on_claude_api_failure():
     service = ScoringService.__new__(ScoringService)
     service.model = "claude"
     client = AsyncMock()
     client.messages.create.side_effect = RuntimeError("500")
     service.client = client
     with pytest.raises(RuntimeError):
         await service.client.messages.create()

Consider expanding this test to verify actual retry logic if the ScoringService implements retries.

backend/src/tests/test_m3_conversation_engine.py (3)

100-109: Remove unused sample_emotion_timeline fixture.

The fixture is declared but never used in this test.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m3_006_agent_initializes_with_heart_persona(sample_emotion_timeline):
+async def test_m3_006_agent_initializes_with_heart_persona():
     mgr = SessionManager("s1", [{"text": "Q1"}])
     tracker = AsyncMock()

372-389: Remove unused monkeypatch parameter.

Proposed fix
 `@pytest.mark.asyncio`
 async def test_m3_035_get_session_status_api(
-    monkeypatch, registered_suitor, completed_session
+    registered_suitor, completed_session
 ):

404-435: Catch specific HTTPException instead of blind Exception.

These tests verify API error handling; catching HTTPException with status code assertions improves test precision.

Proposed fix
+from fastapi import HTTPException
+
 `@pytest.mark.asyncio`
 async def test_m3_037_create_session_invalid_heart(registered_suitor, mock_livekit):
     session_repo = AsyncMock()
     session_repo.find_active_by_suitor.return_value = None
     session_repo.count_today_by_suitor.return_value = 0
     heart_repo = AsyncMock()
     heart_repo.find_by_slug.return_value = None
-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await start_session.__wrapped__(
             SessionStartRequest(heart_slug="missing"),
             registered_suitor,
             heart_repo,
             session_repo,
             mock_livekit,
         )
+    assert exc_info.value.status_code == 404


 `@pytest.mark.asyncio`
 async def test_m3_038_create_session_invalid_suitor(seeded_heart, mock_livekit):
     heart_repo = AsyncMock()
     heart_repo.find_by_slug.return_value = seeded_heart
     session_repo = AsyncMock()
     session_repo.find_active_by_suitor.return_value = None
     session_repo.count_today_by_suitor.return_value = 0
-    with pytest.raises(Exception):
+    with pytest.raises((HTTPException, AttributeError)):
         await start_session.__wrapped__(
             SessionStartRequest(heart_slug=seeded_heart.shareable_slug),
             None,
             heart_repo,
             session_repo,
             mock_livekit,
         )  # type: ignore[arg-type]
backend/src/tests/test_m2_heart_config.py (5)

25-59: Catch specific exception types for validation tests.

Use pydantic.ValidationError for config validation failures instead of blind Exception.

Proposed fix
+from pydantic import ValidationError
+
 `@pytest.mark.asyncio`
 async def test_m2_002_heart_config_validation_fails_on_missing_fields(tmp_path: Path):
     bad = tmp_path / "heart.yaml"
     bad.write_text("shareable_slug: test\n", encoding="utf-8")
     loader = HeartConfigLoader(str(bad))
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         loader.load()


 `@pytest.mark.asyncio`
 async def test_m2_003_heart_config_validation_fails_on_invalid_types(tmp_path: Path):
     # ... YAML content ...
     loader = HeartConfigLoader(str(bad))
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         loader.load()

62-67: Remove unused monkeypatch parameter.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m2_004_heart_config_seeds_db_on_startup(monkeypatch):
+async def test_m2_004_heart_config_seeds_db_on_startup():
     loader = HeartConfigLoader("config/heart_config.yaml")

173-187: Remove unused monkeypatch and catch specific HTTPException.

Proposed fix
+from fastapi import HTTPException
+
 `@pytest.mark.asyncio`
-async def test_m2_011_public_heart_profile_404_wrong_slug(monkeypatch):
+async def test_m2_011_public_heart_profile_404_wrong_slug():
     heart_repo = AsyncMock()
     heart_repo.find_by_slug.return_value = None
     question_repo = AsyncMock()

     class Req:
         class app:
             class state:
                 heart_config = None

-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await public_endpoints.get_public_profile.__wrapped__(
             "missing", Req(), heart_repo, question_repo
         )
+    assert exc_info.value.status_code == 404

154-170: Remove unused monkeypatch parameter.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m2_010_public_heart_profile_returns_200(monkeypatch, seeded_heart):
+async def test_m2_010_public_heart_profile_returns_200(seeded_heart):

216-234: Remove unused monkeypatch parameter.

Proposed fix
 `@pytest.mark.asyncio`
-async def test_m2_014_public_profile_no_sensitive_data(monkeypatch, seeded_heart):
+async def test_m2_014_public_profile_no_sensitive_data(seeded_heart):
backend/src/api/v1/endpoints/sessions.py (1)

330-341: Duplicated verdict_status fallback logic.

The same fallback pattern appears at line 207 and line 331. Consider extracting to a helper method for consistency.

Proposed refactor
def _resolve_verdict_status(session, score) -> str:
    """Resolve verdict status, handling legacy sessions without explicit status."""
    if session.verdict_status:
        return session.verdict_status
    return "ready" if score else "pending"

Comment on lines +14 to +30
def format_expectations(expectations: dict[str, Any] | None) -> str:
if not expectations:
return "No specific expectations defined."

parts: list[str] = []
if expectations.get("looking_for"):
parts.append(f"Looking for: {expectations['looking_for']}")
if expectations.get("values") and isinstance(expectations["values"], list):
parts.append(f"Values: {', '.join(expectations['values'])}")
if expectations.get("must_haves") and isinstance(expectations["must_haves"], list):
parts.append(f"Must-haves: {', '.join(expectations['must_haves'])}")
if expectations.get("communication_preferences"):
parts.append(
f"Communication preferences: {expectations['communication_preferences']}"
)

return "\n".join(parts) if parts else "No specific expectations defined."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against non-dict expectations inputs.
If expectations is a list/string (e.g., legacy/test data), .get() will raise; add a type guard to keep prompt building resilient.

🔧 Suggested guard
 def format_expectations(expectations: dict[str, Any] | None) -> str:
     if not expectations:
         return "No specific expectations defined."
+    if isinstance(expectations, list):
+        return f"Values: {', '.join(str(item) for item in expectations)}"
+    if not isinstance(expectations, dict):
+        return "No specific expectations defined."
🤖 Prompt for AI Agents
In `@backend/src/services/scoring/prompt_builder.py` around lines 14 - 30, The
function format_expectations assumes expectations is a dict and calls .get,
which will raise for list/str legacy inputs; add a type guard at the top of
format_expectations to verify isinstance(expectations, dict) (or Mapping) and if
not, return "No specific expectations defined." (or coerce safe default) so the
subsequent accesses to expectations.get and indexing are safe; update references
to expectations, looking_for, values, must_haves, and communication_preferences
inside format_expectations accordingly.

Comment on lines +1 to +14
persona:
traits: [warm]
vibe: calm
tone: nice
humor_level: 5
strictness: 5
expectations: {}
screening_questions:
- text: Hello?
required: true
shareable_slug: broken
calendar:
calcom_api_key: x
calcom_event_type_id: y
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep non-target fields valid so the test isolates the missing-name error.
calcom_event_type_id being a string can trigger an unrelated validation error.

✅ Suggested fix
 calendar:
   calcom_api_key: x
-  calcom_event_type_id: y
+  calcom_event_type_id: 123
🤖 Prompt for AI Agents
In `@backend/src/tests/fixtures/heart_config_missing_name.yaml` around lines 1 -
14, The test fixture has an unrelated validation failure because
calcom_event_type_id is a string; update the fixture so all non-target fields
are valid while keeping the missing-name condition under test—specifically
change calcom_event_type_id to a valid numeric ID (or appropriately-typed value)
and ensure other fields like calcom_api_key, persona, screening_questions, and
shareable_slug remain valid so the test isolates the missing "name" error.

Comment on lines +20 to +24
@pytest.mark.asyncio
async def test_api_003_not_found_returns_404(client):
resp = await client.get("/api/v1/does-not-exist")
assert resp.status_code in {404, 500}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't allow 500 for not-found routes.
Accepting 500s hides regressions; this should be a hard 404.

✅ Suggested fix
-    assert resp.status_code in {404, 500}
+    assert resp.status_code == 404
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_api_003_not_found_returns_404(client):
resp = await client.get("/api/v1/does-not-exist")
assert resp.status_code in {404, 500}
`@pytest.mark.asyncio`
async def test_api_003_not_found_returns_404(client):
resp = await client.get("/api/v1/does-not-exist")
assert resp.status_code == 404
🤖 Prompt for AI Agents
In `@backend/src/tests/test_api_contract.py` around lines 20 - 24, The test
test_api_003_not_found_returns_404 currently allows a 500 which masks
regressions; update the assertion so the request to "/api/v1/does-not-exist"
must return a hard 404 (e.g., assert resp.status_code == 404) and remove the
{404, 500} set; keep the existing client.get call and test name unchanged so
it's clear this is strictly a not-found behavior check.

Comment on lines +104 to +129
@pytest.mark.asyncio
async def test_e2e_003_livekit_room_creation_fails(seeded_heart, registered_suitor):
heart_repo = AsyncMock()
heart_repo.find_by_slug.return_value = seeded_heart
session_repo = AsyncMock()
session_repo.find_active_by_suitor.return_value = None
session_repo.count_today_by_suitor.return_value = 0
session_repo.model = SessionDb
created = SessionDb(
id=uuid.uuid4(),
heart_id=seeded_heart.id,
suitor_id=registered_suitor.id,
status=SessionStatus.PENDING,
)
session_repo.create.return_value = created
session_repo.update_attr.return_value = created
livekit = AsyncMock()
livekit.create_room.side_effect = RuntimeError("livekit down")
with pytest.raises(Exception):
await start_session.__wrapped__(
SessionStartRequest(heart_slug=seeded_heart.shareable_slug),
registered_suitor,
heart_repo,
session_repo,
livekit,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a specific exception type in the failure test.
Catching Exception is too broad and can mask unrelated failures.

✅ Suggested fix
-    with pytest.raises(Exception):
+    with pytest.raises(RuntimeError):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_e2e_003_livekit_room_creation_fails(seeded_heart, registered_suitor):
heart_repo = AsyncMock()
heart_repo.find_by_slug.return_value = seeded_heart
session_repo = AsyncMock()
session_repo.find_active_by_suitor.return_value = None
session_repo.count_today_by_suitor.return_value = 0
session_repo.model = SessionDb
created = SessionDb(
id=uuid.uuid4(),
heart_id=seeded_heart.id,
suitor_id=registered_suitor.id,
status=SessionStatus.PENDING,
)
session_repo.create.return_value = created
session_repo.update_attr.return_value = created
livekit = AsyncMock()
livekit.create_room.side_effect = RuntimeError("livekit down")
with pytest.raises(Exception):
await start_session.__wrapped__(
SessionStartRequest(heart_slug=seeded_heart.shareable_slug),
registered_suitor,
heart_repo,
session_repo,
livekit,
)
`@pytest.mark.asyncio`
async def test_e2e_003_livekit_room_creation_fails(seeded_heart, registered_suitor):
heart_repo = AsyncMock()
heart_repo.find_by_slug.return_value = seeded_heart
session_repo = AsyncMock()
session_repo.find_active_by_suitor.return_value = None
session_repo.count_today_by_suitor.return_value = 0
session_repo.model = SessionDb
created = SessionDb(
id=uuid.uuid4(),
heart_id=seeded_heart.id,
suitor_id=registered_suitor.id,
status=SessionStatus.PENDING,
)
session_repo.create.return_value = created
session_repo.update_attr.return_value = created
livekit = AsyncMock()
livekit.create_room.side_effect = RuntimeError("livekit down")
with pytest.raises(RuntimeError):
await start_session.__wrapped__(
SessionStartRequest(heart_slug=seeded_heart.shareable_slug),
registered_suitor,
heart_repo,
session_repo,
livekit,
)
🧰 Tools
🪛 Ruff (0.15.0)

[warning] 122-122: Do not assert blind exception: Exception

(B017)

🤖 Prompt for AI Agents
In `@backend/src/tests/test_e2e_flows.py` around lines 104 - 129, The test
test_e2e_003_livekit_room_creation_fails is currently catching the broad
Exception; change the assertion to expect the specific exception raised by the
mocked livekit (RuntimeError) so the test only fails for the intended
livekit-room-creation failure—replace pytest.raises(Exception) with
pytest.raises(RuntimeError) (referencing the test function name and the
livekit.create_room.side_effect set to RuntimeError("livekit down") and the call
to start_session.__wrapped__).

Comment on lines +56 to +68
@pytest.mark.asyncio
async def test_m4_005_register_suitor_xss_sanitized(registered_suitor):
repo = AsyncMock()
repo.update_by_clerk_id.return_value = registered_suitor
payload = SuitorRegisterRequest(
name="<script>alert('x')</script>",
age=24,
gender="x",
orientation="y",
)
await complete_suitor_profile.__wrapped__(payload, registered_suitor, repo)
sent_name = repo.update_by_clerk_id.await_args.args[1]["name"]
assert sent_name.startswith("<script>")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for sanitization functions in the backend codebase
rg -n "sanitize|escape|strip_tags|bleach" backend/src --type py -C 2

Repository: msmel01/waterloo-ai-agents-hackathon2026

Length of output: 3048


🏁 Script executed:

#!/bin/bash
# Find the complete_suitor_profile function implementation
rg -n "def complete_suitor_profile" backend/src --type py -A 20

Repository: msmel01/waterloo-ai-agents-hackathon2026

Length of output: 1623


Fix misleading test name and add XSS sanitization.

The test test_m4_005_register_suitor_xss_sanitized asserts that <script> tags pass through unsanitized (assert sent_name.startswith("<script>")), but the name suggests sanitization should occur. The implementation in backend/src/api/v1/endpoints/suitors.py only calls .strip() on the name with no HTML/script tag escaping.

Either sanitize user input in complete_suitor_profile before storing to the repository, or rename the test to test_m4_005_register_suitor_xss_passthrough and add a comment documenting that this is a known gap to address.

🤖 Prompt for AI Agents
In `@backend/src/tests/test_m4_suitor_entry.py` around lines 56 - 68, The test
name and assertion are inconsistent with intended XSS handling:
test_m4_005_register_suitor_xss_sanitized expects sanitization but asserts the
raw "<script>" is preserved; complete_suitor_profile (in
backend/src/api/v1/endpoints/suitors.py) only calls .strip() on name. Fix by
either (A) implementing HTML/script escaping inside complete_suitor_profile
before calling repo.update_by_clerk_id (apply a sanitizer/escape to the
payload.name so stored value removes or encodes script tags), or (B) if you
intentionally allow passthrough, rename the test to
test_m4_005_register_suitor_xss_passthrough and update its docstring/comment to
document this known gap; update the test assertion or implementation accordingly
to keep names and behavior consistent (refer to complete_suitor_profile and
test_m4_005_register_suitor_xss_sanitized when making the change).

Comment on lines +38 to +69
export function InterviewCompleteScreen() {
const { sessionId = '' } = useParams<{ sessionId: string }>();
const navigate = useNavigate();
const [showQuestions, setShowQuestions] = useState(false);

const statusQuery = useGetSessionStatusApiV1SessionsIdStatusGet(sessionId, {
query: {
enabled: !!sessionId,
refetchInterval: (query) => {
const verdictStatus = query.state.data?.verdict_status;
if (!verdictStatus) {
return 5000;
}
if (verdictStatus === 'ready' || verdictStatus === 'failed') {
return false;
}
return 5000;
},
},
});

const verdictQuery = useGetSessionVerdictApiV1SessionsIdVerdictGet(sessionId, {
query: {
enabled: statusQuery.data?.verdict_status === 'ready',
retry: false,
},
});

const waiting =
!statusQuery.data ||
(!statusQuery.data.has_verdict && statusQuery.data.verdict_status !== 'failed');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard missing sessionId to avoid an infinite loading screen.

If the route is hit without a sessionId, the queries are disabled and waiting stays true, so users see “Analyzing…” indefinitely. Add a dedicated missing-ID state (or redirect) after the hooks.

Suggested fix
   const verdictQuery = useGetSessionVerdictApiV1SessionsIdVerdictGet(sessionId, {
     query: {
       enabled: statusQuery.data?.verdict_status === 'ready',
       retry: false,
     },
   });

+  if (!sessionId) {
+    return (
+      <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
+        <AppHeader />
+        <main className="flex-1 max-w-xl mx-auto w-full mt-8">
+          <Window title="Verdict.exe" icon="info">
+            <div className="text-center space-y-4">
+              <p className="text-sm text-gray-700">Session not found.</p>
+              <Link
+                to="/chats"
+                className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
+              >
+                Back to sessions
+              </Link>
+            </div>
+          </Window>
+        </main>
+      </div>
+    );
+  }
🤖 Prompt for AI Agents
In `@frontend/src/pages/InterviewCompleteScreen.tsx` around lines 38 - 69, In
InterviewCompleteScreen, add an explicit guard for a missing sessionId after the
hooks: if sessionId is falsy, either navigate away (using navigate) or set a
dedicated "missing ID" state and return a short-circuit UI so the component
doesn't rely on statusQuery/verdictQuery and avoids the perpetual waiting state;
reference sessionId, statusQuery, verdictQuery and waiting when implementing the
early return/redirect.

Comment on lines +132 to +150
if (verdictQuery.isError || !verdict) {
return (
<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
<AppHeader />
<main className="flex-1 max-w-xl mx-auto w-full mt-8">
<Window title="Verdict.exe" icon="info">
<div className="text-center space-y-4">
<p className="text-sm text-gray-700">Verdict is not available yet.</p>
<Link
to="/chats"
className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
>
Back to sessions
</Link>
</div>
</Window>
</main>
</div>
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Differentiate verdict fetch errors from “not available yet.”

verdictQuery.isError currently shows the same “not available yet” state, which hides real failures and makes recovery unclear. Consider a separate error UI (or a retry action) for actual fetch errors.

Suggested fix
-  if (verdictQuery.isError || !verdict) {
+  if (verdictQuery.isError) {
+    return (
+      <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
+        <AppHeader />
+        <main className="flex-1 max-w-xl mx-auto w-full mt-8">
+          <Window title="Verdict.exe" icon="info">
+            <div className="text-center space-y-4">
+              <p className="text-sm text-gray-700">We couldn’t load your verdict.</p>
+              <button
+                type="button"
+                onClick={() => verdictQuery.refetch()}
+                className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
+              >
+                Retry
+              </button>
+            </div>
+          </Window>
+        </main>
+      </div>
+    );
+  }
+
+  if (!verdict) {
     return (
       <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
         <AppHeader />
         <main className="flex-1 max-w-xl mx-auto w-full mt-8">
           <Window title="Verdict.exe" icon="info">
             <div className="text-center space-y-4">
               <p className="text-sm text-gray-700">Verdict is not available yet.</p>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (verdictQuery.isError || !verdict) {
return (
<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
<AppHeader />
<main className="flex-1 max-w-xl mx-auto w-full mt-8">
<Window title="Verdict.exe" icon="info">
<div className="text-center space-y-4">
<p className="text-sm text-gray-700">Verdict is not available yet.</p>
<Link
to="/chats"
className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
>
Back to sessions
</Link>
</div>
</Window>
</main>
</div>
);
if (verdictQuery.isError) {
return (
<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
<AppHeader />
<main className="flex-1 max-w-xl mx-auto w-full mt-8">
<Window title="Verdict.exe" icon="info">
<div className="text-center space-y-4">
<p className="text-sm text-gray-700">We couldn't load your verdict.</p>
<button
type="button"
onClick={() => verdictQuery.refetch()}
className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
>
Retry
</button>
</div>
</Window>
</main>
</div>
);
}
if (!verdict) {
return (
<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
<AppHeader />
<main className="flex-1 max-w-xl mx-auto w-full mt-8">
<Window title="Verdict.exe" icon="info">
<div className="text-center space-y-4">
<p className="text-sm text-gray-700">Verdict is not available yet.</p>
<Link
to="/chats"
className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
>
Back to sessions
</Link>
</div>
</Window>
</main>
</div>
);
}
🤖 Prompt for AI Agents
In `@frontend/src/pages/InterviewCompleteScreen.tsx` around lines 132 - 150, The
current render lumps verdictQuery.isError and !verdict together; update
InterviewCompleteScreen's render logic to branch so that if verdictQuery.isError
you render a distinct error state (inside the Window titled "Verdict.exe" or an
ErrorWindow) that displays verdictQuery.error.message and offers a retry action
(call verdictQuery.refetch on a "Retry" button) and/or a link back to sessions,
while keeping the existing "Verdict is not available yet." UI only for the case
where !verdict && !verdictQuery.isError; ensure you reference verdictQuery,
verdictQuery.error, verdictQuery.refetch and the existing Window/"Verdict.exe"
rendering when making the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant