Feat: Add Scoring and Verdict by mahimairaja · Pull Request #15 · msmel01/waterloo-ai-agents-hackathon2026

mahimairaja · 2026-02-15T01:55:09Z

Summary by CodeRabbit

New Features
- Full interview scoring and verdicts with raw/final scores, emotion modifiers, feedback, and per-question breakdowns; UI screen to view detailed verdicts and score bars; session history shows final score and "View details".
Bug Fixes
- Improved session shutdown and disconnection handling; refined session-end messaging.
Chores
- Periodic token refresh added for more reliable auth/session sync.
Tests
- Extensive unit, integration, and end-to-end test suites added for scoring, sessions, and conversation flows.

vercel · 2026-02-15T01:55:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
waterloo-ai-agents-hackathon2026	Ready	Preview, Comment	Feb 15, 2026 2:26am

coderabbitai · 2026-02-15T01:55:31Z

📝 Walkthrough

Walkthrough

Adds a Claude-based scoring pipeline: DB schema and models extended for rich scoring metadata, a scoring service and prompt builder, ARQ worker tasks to enqueue/run scoring, API and frontend surfaces to expose verdicts, and extensive tests and fixtures to validate scoring and flows.

Changes

Cohort / File(s)	Summary
Database schema & models `backend/migrations/versions/bdd57d9128c7_add_scoring.py`, `backend/src/models/score_model.py`, `backend/src/models/session_model.py`	Added score columns (raw_score, final_score, emotion_modifier, per_question_scores, feedback fields, Claude metadata) and session verdict columns (`has_verdict`, `verdict_status`).
Scoring service & prompt builder `backend/src/services/scoring/scoring_service.py`, `backend/src/services/scoring/prompt_builder.py`	New Claude-based ScoringService and prompt construction helpers; JSON parsing, clamping, weighted scoring, verdict determination, and metadata extraction. High logic density — review parsing and clamping, API key handling, and JSON sanitization.
Worker & enqueue flow `backend/workers/main.py`, `backend/src/workers/tasks.py`	Added `score_session_task`, worker wiring, ARQ enqueue via Redis pool, idempotent persistence and session status updates; ensure task retry/exception handling and resource cleanup.
API endpoints & repositories `backend/src/api/v1/endpoints/sessions.py`, `backend/src/api/v1/endpoints/suitors.py`, `backend/src/repository/score_repository.py`, `backend/src/schemas/session_schema.py`	Status/verdict endpoint logic updated to use session verdict fields and optionally include expanded score fields; score repository gains `create` and `exists_for_session`; response schemas extended. Attention: changed endpoint signature for `get_session_status`.
Agent & session lifecycle `backend/agent/interview_agent.py`, `backend/agent/main.py`	Conditional session end in `record_suitor_response` (only call `session_mgr.end` if no end_reason); enhanced disconnect cleanup with `on_close` and `aclose`/`close` handling and error logging.
Configuration & deps `backend/src/core/config.py`, `backend/pyproject.toml`	Added `VERDICT_THRESHOLD` config (default 65.0) and added `anthropic>=0.74.1` dependency.
Frontend: pages, routing & models `frontend/src/pages/InterviewCompleteScreen.tsx`, `frontend/src/pages/ChatScreen.tsx`, `frontend/src/pages/DatesGrid.tsx`, `frontend/src/App.tsx`, `frontend/src/api/model/*`, `frontend/src/api/model/sessionVerdictResponsePerQuestionScores.ts`	New InterviewCompleteScreen and route, navigation change on ChatScreen to /interview/:sessionId/complete, DatesGrid shows final_score and detail links; TypeScript models extended with new scoring fields and per-question type.
Frontend: auth & sync `frontend/src/hooks/useAuthSync.ts`	Refactored token sync to async/await, added 30s periodic refresh with proper cleanup.
OpenAPI / API contract `frontend/openapi.json`, `shared/openapi.json`	OpenAPI schemas updated to include final_score, raw_score, emotion_modifier_reasons, feedback_strengths/improvements, per_question_scores across SessionSummary and SessionVerdictResponse.
Tests & fixtures `backend/src/tests/...`, `backend/src/tests/fixtures/`, `backend/src/tests/fixtures/mock_responses/`	Large additions: unit/e2e/integration tests, extensive fixtures and mock responses for Claude, Hume, Calcom; review for test coverage and mocking realism. Critical review due to volume.

Sequence Diagram

sequenceDiagram
    participant Frontend
    participant API
    participant Queue as ARQ_Queue
    participant Worker
    participant Claude
    participant DB as Database

    Frontend->>API: End interview session (request)
    API->>DB: Persist session end, mark for scoring
    API->>Queue: enqueue_scoring_job(session_id, defer=5s)
    API-->>Frontend: Acknowledge

    Queue->>Worker: Dequeue score_session_task(session_id)
    Worker->>DB: Load session, heart_config, transcript, turns
    Worker->>Worker: build_scoring_prompt(...)
    Worker->>Claude: send prompt (Anthropic/Claude)
    Claude-->>Worker: return JSON/text response
    Worker->>Worker: parse JSON, clamp scores, apply emotion_modifier, compute final_score
    Worker->>DB: create Score record, update session.has_verdict/verdict_status
    Frontend->>API: Poll status / fetch verdict
    API->>DB: read has_verdict / score
    API-->>Frontend: SessionVerdictResponse (scores, feedback, per_question_scores)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

PR #7: Alters the same agent/session shutdown code (backend/agent/interview_agent.py, backend/agent/main.py) — direct overlap in disconnect/end-session behavior.
PR #14: Modifies frontend auth/token sync and routing (useAuthSync.ts, App.tsx, ChatScreen.tsx) — overlaps in auth sync and navigation changes.
PR #6: Introduces or modifies frontend routing and app scaffolding — related to the added InterviewCompleteScreen route and frontend wiring.

Poem

🐰 With twitching nose and tiny hop I cheer,

Claude reads the heart, the answers near.
Scores and warmth in gentle rhyme,
Turn by turn, we count the time.
Hop, hum, and nibble—results appear!

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (23 files): ⚔️ `backend/agent/interview_agent.py` (content) ⚔️ `backend/agent/main.py` (content) ⚔️ `backend/pyproject.toml` (content) ⚔️ `backend/src/api/v1/endpoints/sessions.py` (content) ⚔️ `backend/src/api/v1/endpoints/suitors.py` (content) ⚔️ `backend/src/core/config.py` (content) ⚔️ `backend/src/models/score_model.py` (content) ⚔️ `backend/src/models/session_model.py` (content) ⚔️ `backend/src/repository/score_repository.py` (content) ⚔️ `backend/src/schemas/session_schema.py` (content) ⚔️ `backend/src/tests/conftest.py` (content) ⚔️ `backend/src/workers/tasks.py` (content) ⚔️ `backend/uv.lock` (content) ⚔️ `backend/workers/main.py` (content) ⚔️ `frontend/openapi.json` (content) ⚔️ `frontend/src/App.tsx` (content) ⚔️ `frontend/src/api/model/index.ts` (content) ⚔️ `frontend/src/api/model/sessionSummary.ts` (content) ⚔️ `frontend/src/api/model/sessionVerdictResponse.ts` (content) ⚔️ `frontend/src/hooks/useAuthSync.ts` (content) ⚔️ `frontend/src/pages/ChatScreen.tsx` (content) ⚔️ `frontend/src/pages/DatesGrid.tsx` (content) ⚔️ `shared/openapi.json` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Feat: Add Scoring and Verdict' accurately summarizes the main change, which introduces comprehensive scoring and verdict functionality including Claude-based scoring service, prompt builders, database migrations, API endpoints, and frontend UI.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/sync-back-frontend

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch feat/sync-back-frontend
Create stacked PR with resolved conflicts
Post resolved changes as copyable diffs in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@backend/agent/interview_agent.py`:
- Around line 62-66: The current logic returns "All questions answered..." even
when the session already has an end_reason, which can be misleading; in the
method containing this snippet (look for the block using self.session_mgr.end
and the remaining variable in interview_agent.py), change the branch so that
when remaining == 0 you only call self.session_mgr.end("all_questions_complete")
if end_reason is None, and then return a message that reflects the actual
end_reason when it is already set (e.g., include or reference
self.session_mgr.end_reason in the returned string or return a neutral message
like "Session already ended: {self.session_mgr.end_reason}"). Ensure you do not
overwrite an existing end_reason.

In `@backend/agent/main.py`:
- Around line 304-318: The cleanup block that catches Exception around calling
on_close and closing the session (references: session_mgr.end_reason, on_close,
session.aclose/close, session_id, logger.warning) can swallow
asyncio.CancelledError; change the except to explicitly re-raise cancellations
by catching asyncio.CancelledError (or BaseException subclass) and re-raising
it, while catching other exceptions to log the failure with traceback via
logger.exception or logger.warning including exc info; ensure any coroutine
result from session.close() is awaited as before and that cancellations
propagate instead of being suppressed.

In `@backend/src/api/v1/endpoints/sessions.py`:
- Around line 203-206: The fallback for verdict_status uses
session.verdict_status and session.has_verdict but doesn't account for legacy
rows that have a score but has_verdict=False, causing endpoints to return
"pending" (202); update the verdict_status computation to check for a score
existence before defaulting (e.g., set verdict_status = session.verdict_status
or ("ready" if a score exists on session else "pending")), using the existing
session object and keeping session.has_verdict and session.verdict_status logic
intact; apply the same fix to the other verdict-status computation block that
mirrors this logic.

In `@backend/src/services/scoring/prompt_builder.py`:
- Around line 49-53: The direct int() cast on turn.get("question_index", 0) can
raise ValueError/TypeError for corrupted JSONB (None or non-numeric string);
change the logic in the loop that produces q_idx (the code iterating over
turn_summaries and reading "question_index") to defensively coerce to string or
catch conversion errors—e.g., obtain the raw value from
turn.get("question_index", 0), attempt to convert to int inside a try/except
handling ValueError/TypeError and fall back to 0 (so q_idx = safe_int + 1); keep
the rest of the field parsing (q_text, quality, summary) unchanged to match the
existing defensive str() pattern.

In `@backend/src/services/scoring/scoring_service.py`:
- Around line 145-149: The JSON parsing except block around json.loads(text)
should stop logging the raw Claude response (text) to avoid PII leakage;
instead, in the except json.JSONDecodeError as exc handler (the block that
currently calls logger.error("Claude returned invalid JSON: %s", text[:600])),
replace that with logger.exception(...) to capture the stack trace and log only
metadata about the response (e.g., length = len(text) and a short redacted
preview if desired), then re-raise the ValueError from exc as before; update the
handler that surrounds json.loads(text) to use logger.exception and metadata
rather than printing the full text.

In `@backend/workers/main.py`:
- Around line 61-142: The race causes a duplicate score create to be treated as
a hard failure; modify the scoring flow so that when score_repo.create(...)
fails due to a duplicate/unique constraint, you treat it as idempotent success
instead of falling through to the generic exception handler: specifically, catch
the duplicate/IntegrityError (or create a ScoreExistsError in score_repo) around
score_repo.create, and in that handler re-check
score_repo.exists_for_session(session_uuid) and if true set
session_repo.update_attr(..., "has_verdict", True),
session_repo.update_attr(..., "verdict_status", "ready") and
session_repo.update_status(..., SessionStatus.SCORED) (do not set FAILED); keep
the existing broad except Exception path for other errors which should mark
FAILED.

In `@frontend/src/pages/InterviewCompleteScreen.tsx`:
- Around line 79-151: The loading gate should include verdictQuery.isLoading to
avoid flicker when verdict fetching starts; update the initial conditional that
currently checks statusQuery.isLoading || waiting to also check
verdictQuery.isLoading (i.e., change the guard in the early return that renders
"Analyzing your interview..." to statusQuery.isLoading || waiting ||
verdictQuery.isLoading) so the loading screen remains until verdictQuery
finishes.

🧹 Nitpick comments (1)

backend/src/api/v1/endpoints/suitors.py (1)
74-95: Consider batch-fetching scores to avoid N+1 queries.

The current implementation fetches each score individually within the loop (line 75), resulting in up to 20 additional database queries. While acceptable for current scale, consider batch-fetching scores for all session IDs in a single query if this endpoint becomes a performance bottleneck.
# Example optimization approach:
session_ids = [s.id for s in sessions]
scores_by_session = await score_repo.find_by_session_ids(session_ids)  # batch fetch
The final_score field population on line 93 follows the established pattern correctly.

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@backend/src/services/scoring/prompt_builder.py`:
- Around line 14-30: The function format_expectations assumes expectations is a
dict and calls .get, which will raise for list/str legacy inputs; add a type
guard at the top of format_expectations to verify isinstance(expectations, dict)
(or Mapping) and if not, return "No specific expectations defined." (or coerce
safe default) so the subsequent accesses to expectations.get and indexing are
safe; update references to expectations, looking_for, values, must_haves, and
communication_preferences inside format_expectations accordingly.

In `@backend/src/tests/fixtures/heart_config_missing_name.yaml`:
- Around line 1-14: The test fixture has an unrelated validation failure because
calcom_event_type_id is a string; update the fixture so all non-target fields
are valid while keeping the missing-name condition under test—specifically
change calcom_event_type_id to a valid numeric ID (or appropriately-typed value)
and ensure other fields like calcom_api_key, persona, screening_questions, and
shareable_slug remain valid so the test isolates the missing "name" error.

In `@backend/src/tests/test_api_contract.py`:
- Around line 20-24: The test test_api_003_not_found_returns_404 currently
allows a 500 which masks regressions; update the assertion so the request to
"/api/v1/does-not-exist" must return a hard 404 (e.g., assert resp.status_code
== 404) and remove the {404, 500} set; keep the existing client.get call and
test name unchanged so it's clear this is strictly a not-found behavior check.

In `@backend/src/tests/test_e2e_flows.py`:
- Around line 104-129: The test test_e2e_003_livekit_room_creation_fails is
currently catching the broad Exception; change the assertion to expect the
specific exception raised by the mocked livekit (RuntimeError) so the test only
fails for the intended livekit-room-creation failure—replace
pytest.raises(Exception) with pytest.raises(RuntimeError) (referencing the test
function name and the livekit.create_room.side_effect set to
RuntimeError("livekit down") and the call to start_session.__wrapped__).

In `@backend/src/tests/test_m4_suitor_entry.py`:
- Around line 56-68: The test name and assertion are inconsistent with intended
XSS handling: test_m4_005_register_suitor_xss_sanitized expects sanitization but
asserts the raw "<script>" is preserved; complete_suitor_profile (in
backend/src/api/v1/endpoints/suitors.py) only calls .strip() on name. Fix by
either (A) implementing HTML/script escaping inside complete_suitor_profile
before calling repo.update_by_clerk_id (apply a sanitizer/escape to the
payload.name so stored value removes or encodes script tags), or (B) if you
intentionally allow passthrough, rename the test to
test_m4_005_register_suitor_xss_passthrough and update its docstring/comment to
document this known gap; update the test assertion or implementation accordingly
to keep names and behavior consistent (refer to complete_suitor_profile and
test_m4_005_register_suitor_xss_sanitized when making the change).

In `@frontend/src/pages/InterviewCompleteScreen.tsx`:
- Around line 132-150: The current render lumps verdictQuery.isError and
!verdict together; update InterviewCompleteScreen's render logic to branch so
that if verdictQuery.isError you render a distinct error state (inside the
Window titled "Verdict.exe" or an ErrorWindow) that displays
verdictQuery.error.message and offers a retry action (call verdictQuery.refetch
on a "Retry" button) and/or a link back to sessions, while keeping the existing
"Verdict is not available yet." UI only for the case where !verdict &&
!verdictQuery.isError; ensure you reference verdictQuery, verdictQuery.error,
verdictQuery.refetch and the existing Window/"Verdict.exe" rendering when making
the change.
- Around line 38-69: In InterviewCompleteScreen, add an explicit guard for a
missing sessionId after the hooks: if sessionId is falsy, either navigate away
(using navigate) or set a dedicated "missing ID" state and return a
short-circuit UI so the component doesn't rely on statusQuery/verdictQuery and
avoids the perpetual waiting state; reference sessionId, statusQuery,
verdictQuery and waiting when implementing the early return/redirect.

🧹 Nitpick comments (19)

backend/agent/main.py (1)
316-323: CancelledError handling looks good; minor cleanup for logging.

The explicit asyncio.CancelledError catch-and-reraise (lines 316-317) properly preserves cooperative shutdown semantics—this addresses the previous review concern.

However, the exc argument in logger.exception() is redundant since this method automatically includes the full exception traceback.
♻️ Proposed fix
         except asyncio.CancelledError:
             raise
         except Exception as exc:
             logger.exception(
-                "Failed to finalize/close agent session %s cleanly: %s",
-                session_id,
-                exc,
+                "Failed to finalize/close agent session %s cleanly",
+                session_id,
             )
backend/src/tests/conftest.py (1)
172-197: Align heart_config expectations shape with production schema.
The scoring prompt expects a dict-like expectations structure; a list here can drift from production and cause type errors if reused.
♻️ Suggested update
-        "expectations": ["Sense of humor", "Emotional depth", "Creativity"],
+        "expectations": {
+            "values": ["Sense of humor", "Emotional depth", "Creativity"]
+        },
backend/src/tests/test_api_contract.py (1)

32-36: Make the SQL injection test hit a real endpoint.
Current assertions are tautological and don’t exercise any code path; consider sending the payload to a public API and asserting proper validation/handling.

backend/src/services/scoring/scoring_service.py (1)

29-36: Move the Anthropic model identifier to configuration.

The model identifier "claude-sonnet-4-20250514" is valid and supported in anthropic SDK 0.74.1. However, it should be moved to config.py following the existing pattern used for other model identifiers (e.g., DEEPGRAM_TTS_MODEL, SMALLEST_LLM_MODEL). This allows environment-specific model updates without code changes and keeps the hardcoded value in ScoringService.__init__, database schema, and migrations in sync.
backend/src/tests/test_m4_suitor_entry.py (2)
34-36: Catch specific exception type instead of blind Exception.

Using pytest.raises(Exception) is too broad and may mask unrelated failures. Since SuitorRegisterRequest uses Pydantic validation, catch pydantic.ValidationError for precise test assertions.
Proposed fix
+from pydantic import ValidationError
+
 `@pytest.mark.asyncio`
 async def test_m4_002_register_suitor_name_required():
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         SuitorRegisterRequest(name="", age=20, gender="x", orientation="y")
50-53: Catch specific exception type instead of blind Exception.

Same issue as above - use ValidationError for Pydantic schema validation tests.
Proposed fix
 `@pytest.mark.asyncio`
 async def test_m4_004_register_suitor_long_name_rejected():
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         SuitorRegisterRequest(name="a" * 101, age=22, gender="f", orientation="x")
backend/src/tests/test_m1_foundation.py (1)
162-168: Remove unused monkeypatch parameter.

The monkeypatch fixture is declared but never used in this test.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m1_013_arq_worker_connects_to_redis(monkeypatch):
+async def test_m1_013_arq_worker_connects_to_redis():
     """Worker settings should expose redis DSN."""
     assert WorkerSettings.redis_settings is not None
backend/src/tests/test_m5_scoring.py (3)
63-65: Remove unused monkeypatch parameter.

This test only verifies score_session_task is callable and doesn't use monkeypatch.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m5_002_scoring_worker_picks_up_job(monkeypatch):
+async def test_m5_002_scoring_worker_picks_up_job():
     assert callable(score_session_task)
283-304: Catch specific HTTPException instead of blind Exception.

These tests verify API error responses; catching HTTPException with status code assertions provides better test precision.
Proposed fix
+from fastapi import HTTPException
+
 `@pytest.mark.asyncio`
 async def test_m5_024_get_verdict_api_not_ready(registered_suitor, completed_session):
     session_repo = AsyncMock()
     session_repo.read_by_id.return_value = completed_session
     score_repo = AsyncMock()
     score_repo.find_by_session_id.return_value = None
     completed_session.verdict_status = "scoring"
-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await get_session_verdict.__wrapped__(
             completed_session.id, registered_suitor, session_repo, score_repo
         )
+    assert exc_info.value.status_code == 202


 `@pytest.mark.asyncio`
 async def test_m5_025_get_verdict_api_session_not_found(registered_suitor):
     session_repo = AsyncMock()
     session_repo.read_by_id.return_value = None
     score_repo = AsyncMock()
-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await get_session_verdict.__wrapped__(
             uuid.uuid4(), registered_suitor, session_repo, score_repo
         )
+    assert exc_info.value.status_code == 404
340-348: Remove unused monkeypatch and improve retry test coverage.

The monkeypatch parameter is unused. Additionally, this test only verifies that the mock raises an error when called directly—it doesn't actually test retry behavior of the ScoringService.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m5_027_scoring_retries_on_claude_api_failure(monkeypatch):
+async def test_m5_027_scoring_retries_on_claude_api_failure():
     service = ScoringService.__new__(ScoringService)
     service.model = "claude"
     client = AsyncMock()
     client.messages.create.side_effect = RuntimeError("500")
     service.client = client
     with pytest.raises(RuntimeError):
         await service.client.messages.create()
Consider expanding this test to verify actual retry logic if the ScoringService implements retries.
backend/src/tests/test_m3_conversation_engine.py (3)
100-109: Remove unused sample_emotion_timeline fixture.

The fixture is declared but never used in this test.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m3_006_agent_initializes_with_heart_persona(sample_emotion_timeline):
+async def test_m3_006_agent_initializes_with_heart_persona():
     mgr = SessionManager("s1", [{"text": "Q1"}])
     tracker = AsyncMock()
372-389: Remove unused monkeypatch parameter.
Proposed fix
 `@pytest.mark.asyncio`
 async def test_m3_035_get_session_status_api(
-    monkeypatch, registered_suitor, completed_session
+    registered_suitor, completed_session
 ):
404-435: Catch specific HTTPException instead of blind Exception.

These tests verify API error handling; catching HTTPException with status code assertions improves test precision.
Proposed fix
+from fastapi import HTTPException
+
 `@pytest.mark.asyncio`
 async def test_m3_037_create_session_invalid_heart(registered_suitor, mock_livekit):
     session_repo = AsyncMock()
     session_repo.find_active_by_suitor.return_value = None
     session_repo.count_today_by_suitor.return_value = 0
     heart_repo = AsyncMock()
     heart_repo.find_by_slug.return_value = None
-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await start_session.__wrapped__(
             SessionStartRequest(heart_slug="missing"),
             registered_suitor,
             heart_repo,
             session_repo,
             mock_livekit,
         )
+    assert exc_info.value.status_code == 404


 `@pytest.mark.asyncio`
 async def test_m3_038_create_session_invalid_suitor(seeded_heart, mock_livekit):
     heart_repo = AsyncMock()
     heart_repo.find_by_slug.return_value = seeded_heart
     session_repo = AsyncMock()
     session_repo.find_active_by_suitor.return_value = None
     session_repo.count_today_by_suitor.return_value = 0
-    with pytest.raises(Exception):
+    with pytest.raises((HTTPException, AttributeError)):
         await start_session.__wrapped__(
             SessionStartRequest(heart_slug=seeded_heart.shareable_slug),
             None,
             heart_repo,
             session_repo,
             mock_livekit,
         )  # type: ignore[arg-type]
backend/src/tests/test_m2_heart_config.py (5)
25-59: Catch specific exception types for validation tests.

Use pydantic.ValidationError for config validation failures instead of blind Exception.
Proposed fix
+from pydantic import ValidationError
+
 `@pytest.mark.asyncio`
 async def test_m2_002_heart_config_validation_fails_on_missing_fields(tmp_path: Path):
     bad = tmp_path / "heart.yaml"
     bad.write_text("shareable_slug: test\n", encoding="utf-8")
     loader = HeartConfigLoader(str(bad))
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         loader.load()


 `@pytest.mark.asyncio`
 async def test_m2_003_heart_config_validation_fails_on_invalid_types(tmp_path: Path):
     # ... YAML content ...
     loader = HeartConfigLoader(str(bad))
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         loader.load()
62-67: Remove unused monkeypatch parameter.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m2_004_heart_config_seeds_db_on_startup(monkeypatch):
+async def test_m2_004_heart_config_seeds_db_on_startup():
     loader = HeartConfigLoader("config/heart_config.yaml")
173-187: Remove unused monkeypatch and catch specific HTTPException.
Proposed fix
+from fastapi import HTTPException
+
 `@pytest.mark.asyncio`
-async def test_m2_011_public_heart_profile_404_wrong_slug(monkeypatch):
+async def test_m2_011_public_heart_profile_404_wrong_slug():
     heart_repo = AsyncMock()
     heart_repo.find_by_slug.return_value = None
     question_repo = AsyncMock()

     class Req:
         class app:
             class state:
                 heart_config = None

-    with pytest.raises(Exception):
+    with pytest.raises(HTTPException) as exc_info:
         await public_endpoints.get_public_profile.__wrapped__(
             "missing", Req(), heart_repo, question_repo
         )
+    assert exc_info.value.status_code == 404
154-170: Remove unused monkeypatch parameter.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m2_010_public_heart_profile_returns_200(monkeypatch, seeded_heart):
+async def test_m2_010_public_heart_profile_returns_200(seeded_heart):
216-234: Remove unused monkeypatch parameter.
Proposed fix
 `@pytest.mark.asyncio`
-async def test_m2_014_public_profile_no_sensitive_data(monkeypatch, seeded_heart):
+async def test_m2_014_public_profile_no_sensitive_data(seeded_heart):
backend/src/api/v1/endpoints/sessions.py (1)
330-341: Duplicated verdict_status fallback logic.

The same fallback pattern appears at line 207 and line 331. Consider extracting to a helper method for consistency.
Proposed refactor
def _resolve_verdict_status(session, score) -> str:
    """Resolve verdict status, handling legacy sessions without explicit status."""
    if session.verdict_status:
        return session.verdict_status
    return "ready" if score else "pending"

coderabbitai · 2026-02-15T02:33:21Z

+def format_expectations(expectations: dict[str, Any] | None) -> str:
+    if not expectations:
+        return "No specific expectations defined."
+
+    parts: list[str] = []
+    if expectations.get("looking_for"):
+        parts.append(f"Looking for: {expectations['looking_for']}")
+    if expectations.get("values") and isinstance(expectations["values"], list):
+        parts.append(f"Values: {', '.join(expectations['values'])}")
+    if expectations.get("must_haves") and isinstance(expectations["must_haves"], list):
+        parts.append(f"Must-haves: {', '.join(expectations['must_haves'])}")
+    if expectations.get("communication_preferences"):
+        parts.append(
+            f"Communication preferences: {expectations['communication_preferences']}"
+        )
+
+    return "\n".join(parts) if parts else "No specific expectations defined."


⚠️ Potential issue | 🟡 Minor

Guard against non-dict expectations inputs.
If expectations is a list/string (e.g., legacy/test data), .get() will raise; add a type guard to keep prompt building resilient.

🔧 Suggested guard

def format_expectations(expectations: dict[str, Any] | None) -> str: if not expectations: return "No specific expectations defined." + if isinstance(expectations, list): + return f"Values: {', '.join(str(item) for item in expectations)}" + if not isinstance(expectations, dict): + return "No specific expectations defined."

🤖 Prompt for AI Agents

In `@backend/src/services/scoring/prompt_builder.py` around lines 14 - 30, The function format_expectations assumes expectations is a dict and calls .get, which will raise for list/str legacy inputs; add a type guard at the top of format_expectations to verify isinstance(expectations, dict) (or Mapping) and if not, return "No specific expectations defined." (or coerce safe default) so the subsequent accesses to expectations.get and indexing are safe; update references to expectations, looking_for, values, must_haves, and communication_preferences inside format_expectations accordingly.

coderabbitai · 2026-02-15T02:33:21Z

+persona:
+  traits: [warm]
+  vibe: calm
+  tone: nice
+  humor_level: 5
+  strictness: 5
+expectations: {}
+screening_questions:
+  - text: Hello?
+    required: true
+shareable_slug: broken
+calendar:
+  calcom_api_key: x
+  calcom_event_type_id: y


⚠️ Potential issue | 🟡 Minor

Keep non-target fields valid so the test isolates the missing-name error.
calcom_event_type_id being a string can trigger an unrelated validation error.

✅ Suggested fix

calendar: calcom_api_key: x - calcom_event_type_id: y + calcom_event_type_id: 123

🤖 Prompt for AI Agents

In `@backend/src/tests/fixtures/heart_config_missing_name.yaml` around lines 1 - 14, The test fixture has an unrelated validation failure because calcom_event_type_id is a string; update the fixture so all non-target fields are valid while keeping the missing-name condition under test—specifically change calcom_event_type_id to a valid numeric ID (or appropriately-typed value) and ensure other fields like calcom_api_key, persona, screening_questions, and shareable_slug remain valid so the test isolates the missing "name" error.

coderabbitai · 2026-02-15T02:33:21Z

+@pytest.mark.asyncio
+async def test_api_003_not_found_returns_404(client):
+    resp = await client.get("/api/v1/does-not-exist")
+    assert resp.status_code in {404, 500}
+


⚠️ Potential issue | 🟡 Minor

Don't allow 500 for not-found routes.
Accepting 500s hides regressions; this should be a hard 404.

✅ Suggested fix

- assert resp.status_code in {404, 500} + assert resp.status_code == 404

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@pytest.mark.asyncio

async def test_api_003_not_found_returns_404(client):

resp = await client.get("/api/v1/does-not-exist")

assert resp.status_code in {404, 500}

`@pytest.mark.asyncio`

async def test_api_003_not_found_returns_404(client):

resp = await client.get("/api/v1/does-not-exist")

assert resp.status_code == 404

🤖 Prompt for AI Agents

In `@backend/src/tests/test_api_contract.py` around lines 20 - 24, The test test_api_003_not_found_returns_404 currently allows a 500 which masks regressions; update the assertion so the request to "/api/v1/does-not-exist" must return a hard 404 (e.g., assert resp.status_code == 404) and remove the {404, 500} set; keep the existing client.get call and test name unchanged so it's clear this is strictly a not-found behavior check.

coderabbitai · 2026-02-15T02:33:21Z

+@pytest.mark.asyncio
+async def test_e2e_003_livekit_room_creation_fails(seeded_heart, registered_suitor):
+    heart_repo = AsyncMock()
+    heart_repo.find_by_slug.return_value = seeded_heart
+    session_repo = AsyncMock()
+    session_repo.find_active_by_suitor.return_value = None
+    session_repo.count_today_by_suitor.return_value = 0
+    session_repo.model = SessionDb
+    created = SessionDb(
+        id=uuid.uuid4(),
+        heart_id=seeded_heart.id,
+        suitor_id=registered_suitor.id,
+        status=SessionStatus.PENDING,
+    )
+    session_repo.create.return_value = created
+    session_repo.update_attr.return_value = created
+    livekit = AsyncMock()
+    livekit.create_room.side_effect = RuntimeError("livekit down")
+    with pytest.raises(Exception):
+        await start_session.__wrapped__(
+            SessionStartRequest(heart_slug=seeded_heart.shareable_slug),
+            registered_suitor,
+            heart_repo,
+            session_repo,
+            livekit,
+        )


⚠️ Potential issue | 🟡 Minor

Use a specific exception type in the failure test.
Catching Exception is too broad and can mask unrelated failures.

✅ Suggested fix

- with pytest.raises(Exception): + with pytest.raises(RuntimeError):

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@pytest.mark.asyncio

async def test_e2e_003_livekit_room_creation_fails(seeded_heart, registered_suitor):

heart_repo = AsyncMock()

heart_repo.find_by_slug.return_value = seeded_heart

session_repo = AsyncMock()

session_repo.find_active_by_suitor.return_value = None

session_repo.count_today_by_suitor.return_value = 0

session_repo.model = SessionDb

created = SessionDb(

id=uuid.uuid4(),

heart_id=seeded_heart.id,

suitor_id=registered_suitor.id,

status=SessionStatus.PENDING,

)

session_repo.create.return_value = created

session_repo.update_attr.return_value = created

livekit = AsyncMock()

livekit.create_room.side_effect = RuntimeError("livekit down")

with pytest.raises(Exception):

await start_session.__wrapped__(

SessionStartRequest(heart_slug=seeded_heart.shareable_slug),

registered_suitor,

heart_repo,

session_repo,

livekit,

)

`@pytest.mark.asyncio`

async def test_e2e_003_livekit_room_creation_fails(seeded_heart, registered_suitor):

heart_repo = AsyncMock()

heart_repo.find_by_slug.return_value = seeded_heart

session_repo = AsyncMock()

session_repo.find_active_by_suitor.return_value = None

session_repo.count_today_by_suitor.return_value = 0

session_repo.model = SessionDb

created = SessionDb(

id=uuid.uuid4(),

heart_id=seeded_heart.id,

suitor_id=registered_suitor.id,

status=SessionStatus.PENDING,

)

session_repo.create.return_value = created

session_repo.update_attr.return_value = created

livekit = AsyncMock()

livekit.create_room.side_effect = RuntimeError("livekit down")

with pytest.raises(RuntimeError):

await start_session.__wrapped__(

SessionStartRequest(heart_slug=seeded_heart.shareable_slug),

registered_suitor,

heart_repo,

session_repo,

livekit,

)

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 122-122: Do not assert blind exception: Exception

(B017)

🤖 Prompt for AI Agents

In `@backend/src/tests/test_e2e_flows.py` around lines 104 - 129, The test test_e2e_003_livekit_room_creation_fails is currently catching the broad Exception; change the assertion to expect the specific exception raised by the mocked livekit (RuntimeError) so the test only fails for the intended livekit-room-creation failure—replace pytest.raises(Exception) with pytest.raises(RuntimeError) (referencing the test function name and the livekit.create_room.side_effect set to RuntimeError("livekit down") and the call to start_session.__wrapped__).

coderabbitai · 2026-02-15T02:33:21Z

+@pytest.mark.asyncio
+async def test_m4_005_register_suitor_xss_sanitized(registered_suitor):
+    repo = AsyncMock()
+    repo.update_by_clerk_id.return_value = registered_suitor
+    payload = SuitorRegisterRequest(
+        name="<script>alert('x')</script>",
+        age=24,
+        gender="x",
+        orientation="y",
+    )
+    await complete_suitor_profile.__wrapped__(payload, registered_suitor, repo)
+    sent_name = repo.update_by_clerk_id.await_args.args[1]["name"]
+    assert sent_name.startswith("<script>")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for sanitization functions in the backend codebase rg -n "sanitize|escape|strip_tags|bleach" backend/src --type py -C 2

Repository: msmel01/waterloo-ai-agents-hackathon2026

Length of output: 3048

🏁 Script executed:

#!/bin/bash # Find the complete_suitor_profile function implementation rg -n "def complete_suitor_profile" backend/src --type py -A 20

Repository: msmel01/waterloo-ai-agents-hackathon2026

Length of output: 1623

Fix misleading test name and add XSS sanitization.

The test test_m4_005_register_suitor_xss_sanitized asserts that <script> tags pass through unsanitized (assert sent_name.startswith("<script>")), but the name suggests sanitization should occur. The implementation in backend/src/api/v1/endpoints/suitors.py only calls .strip() on the name with no HTML/script tag escaping.

Either sanitize user input in complete_suitor_profile before storing to the repository, or rename the test to test_m4_005_register_suitor_xss_passthrough and add a comment documenting that this is a known gap to address.

🤖 Prompt for AI Agents

In `@backend/src/tests/test_m4_suitor_entry.py` around lines 56 - 68, The test name and assertion are inconsistent with intended XSS handling: test_m4_005_register_suitor_xss_sanitized expects sanitization but asserts the raw "<script>" is preserved; complete_suitor_profile (in backend/src/api/v1/endpoints/suitors.py) only calls .strip() on name. Fix by either (A) implementing HTML/script escaping inside complete_suitor_profile before calling repo.update_by_clerk_id (apply a sanitizer/escape to the payload.name so stored value removes or encodes script tags), or (B) if you intentionally allow passthrough, rename the test to test_m4_005_register_suitor_xss_passthrough and update its docstring/comment to document this known gap; update the test assertion or implementation accordingly to keep names and behavior consistent (refer to complete_suitor_profile and test_m4_005_register_suitor_xss_sanitized when making the change).

coderabbitai · 2026-02-15T02:33:21Z

+export function InterviewCompleteScreen() {
+  const { sessionId = '' } = useParams<{ sessionId: string }>();
+  const navigate = useNavigate();
+  const [showQuestions, setShowQuestions] = useState(false);
+
+  const statusQuery = useGetSessionStatusApiV1SessionsIdStatusGet(sessionId, {
+    query: {
+      enabled: !!sessionId,
+      refetchInterval: (query) => {
+        const verdictStatus = query.state.data?.verdict_status;
+        if (!verdictStatus) {
+          return 5000;
+        }
+        if (verdictStatus === 'ready' || verdictStatus === 'failed') {
+          return false;
+        }
+        return 5000;
+      },
+    },
+  });
+
+  const verdictQuery = useGetSessionVerdictApiV1SessionsIdVerdictGet(sessionId, {
+    query: {
+      enabled: statusQuery.data?.verdict_status === 'ready',
+      retry: false,
+    },
+  });
+
+  const waiting =
+    !statusQuery.data ||
+    (!statusQuery.data.has_verdict && statusQuery.data.verdict_status !== 'failed');
+


⚠️ Potential issue | 🟡 Minor

Guard missing sessionId to avoid an infinite loading screen.

If the route is hit without a sessionId, the queries are disabled and waiting stays true, so users see “Analyzing…” indefinitely. Add a dedicated missing-ID state (or redirect) after the hooks.

Suggested fix

const verdictQuery = useGetSessionVerdictApiV1SessionsIdVerdictGet(sessionId, { query: { enabled: statusQuery.data?.verdict_status === 'ready', retry: false, }, }); + if (!sessionId) { + return ( + <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4"> + <AppHeader /> + <main className="flex-1 max-w-xl mx-auto w-full mt-8"> + <Window title="Verdict.exe" icon="info"> + <div className="text-center space-y-4"> + <p className="text-sm text-gray-700">Session not found.</p> + <Link + to="/chats" + className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel" + > + Back to sessions + </Link> + </div> + </Window> + </main> + </div> + ); + }

🤖 Prompt for AI Agents

In `@frontend/src/pages/InterviewCompleteScreen.tsx` around lines 38 - 69, In InterviewCompleteScreen, add an explicit guard for a missing sessionId after the hooks: if sessionId is falsy, either navigate away (using navigate) or set a dedicated "missing ID" state and return a short-circuit UI so the component doesn't rely on statusQuery/verdictQuery and avoids the perpetual waiting state; reference sessionId, statusQuery, verdictQuery and waiting when implementing the early return/redirect.

coderabbitai · 2026-02-15T02:33:21Z

+  if (verdictQuery.isError || !verdict) {
+    return (
+      <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">
+        <AppHeader />
+        <main className="flex-1 max-w-xl mx-auto w-full mt-8">
+          <Window title="Verdict.exe" icon="info">
+            <div className="text-center space-y-4">
+              <p className="text-sm text-gray-700">Verdict is not available yet.</p>
+              <Link
+                to="/chats"
+                className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"
+              >
+                Back to sessions
+              </Link>
+            </div>
+          </Window>
+        </main>
+      </div>
+    );


⚠️ Potential issue | 🟡 Minor

Differentiate verdict fetch errors from “not available yet.”

verdictQuery.isError currently shows the same “not available yet” state, which hides real failures and makes recovery unclear. Consider a separate error UI (or a retry action) for actual fetch errors.

Suggested fix

- if (verdictQuery.isError || !verdict) { + if (verdictQuery.isError) { + return ( + <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4"> + <AppHeader /> + <main className="flex-1 max-w-xl mx-auto w-full mt-8"> + <Window title="Verdict.exe" icon="info"> + <div className="text-center space-y-4"> + <p className="text-sm text-gray-700">We couldn’t load your verdict.</p> + <button + type="button" + onClick={() => verdictQuery.refetch()} + className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel" + > + Retry + </button> + </div> + </Window> + </main> + </div> + ); + } + + if (!verdict) { return ( <div className="min-h-screen bg-win-bg flex flex-col py-6 px-4"> <AppHeader /> <main className="flex-1 max-w-xl mx-auto w-full mt-8"> <Window title="Verdict.exe" icon="info"> <div className="text-center space-y-4"> <p className="text-sm text-gray-700">Verdict is not available yet.</p>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (verdictQuery.isError || !verdict) {

return (

<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">

<AppHeader />

<main className="flex-1 max-w-xl mx-auto w-full mt-8">

<Window title="Verdict.exe" icon="info">

<div className="text-center space-y-4">

<p className="text-sm text-gray-700">Verdict is not available yet.</p>

<Link

to="/chats"

className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"

>

Back to sessions

</Link>

</div>

</Window>

</main>

</div>

);

if (verdictQuery.isError) {

return (

<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">

<AppHeader />

<main className="flex-1 max-w-xl mx-auto w-full mt-8">

<Window title="Verdict.exe" icon="info">

<div className="text-center space-y-4">

<p className="text-sm text-gray-700">We couldn't load your verdict.</p>

<button

type="button"

onClick={() => verdictQuery.refetch()}

className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"

>

Retry

</button>

</div>

</Window>

</main>

</div>

);

}

if (!verdict) {

return (

<div className="min-h-screen bg-win-bg flex flex-col py-6 px-4">

<AppHeader />

<main className="flex-1 max-w-xl mx-auto w-full mt-8">

<Window title="Verdict.exe" icon="info">

<div className="text-center space-y-4">

<p className="text-sm text-gray-700">Verdict is not available yet.</p>

<Link

to="/chats"

className="inline-block px-4 py-2 bg-win-titlebar text-white text-sm border border-palette-orchid shadow-bevel"

>

Back to sessions

</Link>

</div>

</Window>

</main>

</div>

);

}

🤖 Prompt for AI Agents

In `@frontend/src/pages/InterviewCompleteScreen.tsx` around lines 132 - 150, The current render lumps verdictQuery.isError and !verdict together; update InterviewCompleteScreen's render logic to branch so that if verdictQuery.isError you render a distinct error state (inside the Window titled "Verdict.exe" or an ErrorWindow) that displays verdictQuery.error.message and offers a retry action (call verdictQuery.refetch on a "Retry" button) and/or a link back to sessions, while keeping the existing "Verdict is not available yet." UI only for the case where !verdict && !verdictQuery.isError; ensure you reference verdictQuery, verdictQuery.error, verdictQuery.refetch and the existing Window/"Verdict.exe" rendering when making the change.

mahimairaja added 2 commits February 14, 2026 20:24

feat: scoring and verdict engine

960e504

fix: session ending

e0d54eb

mahimairaja self-assigned this Feb 15, 2026

mahimairaja added the enhancement New feature or request label Feb 15, 2026

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

mahimairaja added 2 commits February 14, 2026 21:10

fix: code rabbit fix

ac448dc

feat: added test cases

1e82dc8

vercel bot deployed to Preview February 15, 2026 02:26 View deployment

mahimairaja merged commit b5803a8 into main Feb 15, 2026
5 of 6 checks passed

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Feb 15, 2026

feat: removed hume stream emotion tracker #16

Merged

Conversation

mahimairaja commented Feb 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mahimairaja commented Feb 15, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Feb 15, 2026 •

edited

Loading

coderabbitai bot commented Feb 15, 2026 •

edited

Loading