Scoring service by metlos · Pull Request #245 · codeready-toolchain/tarsy-bot

metlos · 2026-01-28T16:37:20Z

This PR stacks on top of #244 and introduces a service that performs the actual conversations with the judge LLMs.

Part of the PR stack:

coderabbitai · 2026-01-28T16:37:28Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch metlos/scoring-service

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

backend/tarsy/agents/prompts/judges.py

alexeykazakov · 2026-01-28T20:46:02Z

backend/tarsy/agents/prompts/judges.py

 TARSy alert analysis quality. Supports placeholder substitution for:
- {{SESSION_CONVERSATION}}: Full conversation from History Service
- {{ALERT_DATA}}: Original alert data
+- {{ALERT_DATA}}: Original alert data (JSON)


Alert data can be anything. Not necessarily JSON. It's just a text.

alexeykazakov · 2026-01-28T21:22:20Z

backend/tarsy/services/scoring_service.py

+
+        return client
+
+    async def _retry_database_operation_async(


I see this service duplicates a lot of functionality from the history service.
History service is a legacy name and we should probably rename it but it's essentially our main DB service.
So it works like this:
<Regular Service (like AlertService or ChatService)> -> History Service -> Repo(s) -> DB

The history service encapsulates high level DB logic like initialization, re-try, etc. All DB operations funnel through the history service. Let's keep it this way and move all these DB functionality to the history service.

I've created a PR to split the HistoryService into smaller sub-services: #250
So you can add a new subservice (and update the history service facade) there,

alexeykazakov · 2026-01-28T23:15:39Z

backend/tarsy/services/scoring_service.py

+            extract_analysis = True
+            logger.info(f"Executing Turn 1: Score evaluation for score {score_id}")
+            for _ in range(3):  # 3 attempts to get the score
+                conversation = await llm_client.generate_response(


We need to move most of this functionality to the agent package. See the current structure of agents (base agent and extensions) and also agent controllers.

I would create a scoring agent, very similar to SynthesisAgent. And two controllers for it (very similar to Synthesis controllers). The rest of the functionality will be delegated to the corresponding llm clients.

We also would need a new configuration for chains. Similar to Synthesis configuration. Where we can define things like: llm provider, strategy (react, native-thining), etc.

Basically the scoring service becomes an orchestrator. It delegates DB work to the history service. Agent/LLM work to the Agent/Controllers. Session history building to the History Service. Prompt building to the prompt builder, etc.

alexeykazakov · 2026-01-29T04:37:36Z

backend/tarsy/services/scoring_service.py

+            logger.error(f"Failed to get scoring repository: {str(e)}")
+            yield None
+
+    def _format_conversation_messages(


I created a new method in the history service to generate the investigation session history: #249
You can use it as-is now. This is also used by the chat service so we don't have to duplicate it for different services/agents.

due to some integrity constraint violation.

…se session

… model

…rousness than a concrete theme

implement the database layer for session scoring

b624ebe

This was referenced Jan 28, 2026

REST API controller for the scoring service #246

Closed

implement the database layer for session scoring #243

Closed

introduce judge prompts and simple computed field to the SessionScore model #244

Closed

Scoring UI #247

Closed

Scoring UI #248

Closed

alexeykazakov reviewed Jan 29, 2026

View reviewed changes

Added the type annotations and marked unit tests as such.

ce90046

metlos force-pushed the metlos/scoring-service branch from b36d6fa to fc8a9a9 Compare January 29, 2026 12:55

metlos force-pushed the metlos/scoring-judge-prompts branch from 949b9ee to 917492e Compare January 29, 2026 12:55

merge master

0fce948

metlos force-pushed the metlos/scoring-judge-prompts branch from 917492e to b3b192e Compare January 29, 2026 12:58

metlos force-pushed the metlos/scoring-service branch from fc8a9a9 to 8a27218 Compare January 29, 2026 12:58

metlos added 2 commits January 29, 2026 17:31

improve the error message when failing to create the session score

5a29fbf

due to some integrity constraint violation.

typesafe scoring status

64b323e

metlos force-pushed the metlos/scoring-judge-prompts branch from b3b192e to f3be3cb Compare January 30, 2026 19:26

metlos force-pushed the metlos/scoring-service branch from 8a27218 to a0cd1d1 Compare January 30, 2026 19:26

merge master

a77e3fd

metlos force-pushed the metlos/scoring-service branch from a0cd1d1 to 22adfda Compare January 30, 2026 22:13

metlos force-pushed the metlos/scoring-judge-prompts branch from f3be3cb to 32f7e7b Compare January 30, 2026 22:13

merge master

b108dab

metlos force-pushed the metlos/scoring-service branch from 22adfda to 10f7fe3 Compare February 2, 2026 12:26

metlos force-pushed the metlos/scoring-judge-prompts branch from 32f7e7b to 0c04a59 Compare February 2, 2026 12:26

simplify the test setup using already existing fixture for the databa…

6da0c69

…se session

metlos force-pushed the metlos/scoring-judge-prompts branch from 0c04a59 to b887e28 Compare February 2, 2026 13:28

metlos force-pushed the metlos/scoring-service branch from 10f7fe3 to 8acd64b Compare February 2, 2026 13:29

metlos added 2 commits February 2, 2026 21:03

merge master

23767d6

fix the db version ordering

ae5fcfa

metlos force-pushed the metlos/scoring-service branch from 8acd64b to cb61021 Compare February 2, 2026 20:11

metlos force-pushed the metlos/scoring-judge-prompts branch from b887e28 to 0c9b2be Compare February 2, 2026 20:11

metlos added 4 commits February 2, 2026 21:23

add the ScoringStatus.TIMED_OUT variant

e32e588

implement the scoring service

859190c

introduce judge prompts and simple computed field to the SessionScore…

2ddbb44

… model

add support for timeouts in the scoring service

1a0b260

metlos force-pushed the metlos/scoring-judge-prompts branch from 0c9b2be to 2ddbb44 Compare February 3, 2026 10:28

metlos force-pushed the metlos/scoring-service branch from cb61021 to 1a0b260 Compare February 3, 2026 10:28

metlos added 5 commits February 3, 2026 21:50

make the judge prompts more topic-agnostic, focusing more on the rigo…

091a66a

…rousness than a concrete theme

merge new version of judge prompts

e40acde

merge master

f8830e7

merge master

90ff4d8

merge master

66bf0dc

metlos force-pushed the metlos/scoring-judge-prompts branch 2 times, most recently from c6c1bf9 to 9a9bdf5 Compare February 13, 2026 15:36

metlos marked this pull request as draft February 13, 2026 15:39

alexeykazakov closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring service#245

Scoring service#245
metlos wants to merge 19 commits intometlos/scoring-judge-promptsfrom
metlos/scoring-service

metlos commented Jan 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

alexeykazakov Jan 28, 2026

Uh oh!

alexeykazakov Jan 28, 2026

Uh oh!

alexeykazakov Jan 28, 2026

Uh oh!

alexeykazakov Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

metlos commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

alexeykazakov Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

alexeykazakov Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

alexeykazakov Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

alexeykazakov Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metlos commented Jan 28, 2026 •

edited

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading