Skip to content
This repository was archived by the owner on Feb 26, 2026. It is now read-only.

Scoring service#245

Closed
metlos wants to merge 19 commits intometlos/scoring-judge-promptsfrom
metlos/scoring-service
Closed

Scoring service#245
metlos wants to merge 19 commits intometlos/scoring-judge-promptsfrom
metlos/scoring-service

Conversation

@metlos
Copy link
Collaborator

@metlos metlos commented Jan 28, 2026

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch metlos/scoring-service

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

TARSy alert analysis quality. Supports placeholder substitution for:
- {{SESSION_CONVERSATION}}: Full conversation from History Service
- {{ALERT_DATA}}: Original alert data
- {{ALERT_DATA}}: Original alert data (JSON)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alert data can be anything. Not necessarily JSON. It's just a text.


return client

async def _retry_database_operation_async(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this service duplicates a lot of functionality from the history service.
History service is a legacy name and we should probably rename it but it's essentially our main DB service.
So it works like this:
<Regular Service (like AlertService or ChatService)> -> History Service -> Repo(s) -> DB

The history service encapsulates high level DB logic like initialization, re-try, etc. All DB operations funnel through the history service. Let's keep it this way and move all these DB functionality to the history service.

I've created a PR to split the HistoryService into smaller sub-services: #250
So you can add a new subservice (and update the history service facade) there,

extract_analysis = True
logger.info(f"Executing Turn 1: Score evaluation for score {score_id}")
for _ in range(3): # 3 attempts to get the score
conversation = await llm_client.generate_response(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move most of this functionality to the agent package. See the current structure of agents (base agent and extensions) and also agent controllers.

I would create a scoring agent, very similar to SynthesisAgent. And two controllers for it (very similar to Synthesis controllers). The rest of the functionality will be delegated to the corresponding llm clients.

We also would need a new configuration for chains. Similar to Synthesis configuration. Where we can define things like: llm provider, strategy (react, native-thining), etc.

Basically the scoring service becomes an orchestrator. It delegates DB work to the history service. Agent/LLM work to the Agent/Controllers. Session history building to the History Service. Prompt building to the prompt builder, etc.

logger.error(f"Failed to get scoring repository: {str(e)}")
yield None

def _format_conversation_messages(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new method in the history service to generate the investigation session history: #249
You can use it as-is now. This is also used by the chat service so we don't have to duplicate it for different services/agents.

@metlos metlos force-pushed the metlos/scoring-service branch from b36d6fa to fc8a9a9 Compare January 29, 2026 12:55
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from 949b9ee to 917492e Compare January 29, 2026 12:55
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from 917492e to b3b192e Compare January 29, 2026 12:58
@metlos metlos force-pushed the metlos/scoring-service branch from fc8a9a9 to 8a27218 Compare January 29, 2026 12:58
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from b3b192e to f3be3cb Compare January 30, 2026 19:26
@metlos metlos force-pushed the metlos/scoring-service branch from 8a27218 to a0cd1d1 Compare January 30, 2026 19:26
@metlos metlos force-pushed the metlos/scoring-service branch from a0cd1d1 to 22adfda Compare January 30, 2026 22:13
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from f3be3cb to 32f7e7b Compare January 30, 2026 22:13
@metlos metlos force-pushed the metlos/scoring-service branch from 22adfda to 10f7fe3 Compare February 2, 2026 12:26
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from 32f7e7b to 0c04a59 Compare February 2, 2026 12:26
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from 0c04a59 to b887e28 Compare February 2, 2026 13:28
@metlos metlos force-pushed the metlos/scoring-service branch from 10f7fe3 to 8acd64b Compare February 2, 2026 13:29
@metlos metlos force-pushed the metlos/scoring-service branch from 8acd64b to cb61021 Compare February 2, 2026 20:11
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from b887e28 to 0c9b2be Compare February 2, 2026 20:11
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch from 0c9b2be to 2ddbb44 Compare February 3, 2026 10:28
@metlos metlos force-pushed the metlos/scoring-service branch from cb61021 to 1a0b260 Compare February 3, 2026 10:28
@metlos metlos force-pushed the metlos/scoring-judge-prompts branch 2 times, most recently from c6c1bf9 to 9a9bdf5 Compare February 13, 2026 15:36
@metlos metlos marked this pull request as draft February 13, 2026 15:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants