Skip to content

Improve extraction prompts and logic #32

@kaminoguo

Description

@kaminoguo

Files

File Purpose
src/sqrl/agents/user_scanner.py Stage 1: Flash model scans user messages for patterns
src/sqrl/agents/memory_extractor.py Stage 2: Pro model extracts memories from flagged messages
src/sqrl/agents/project_summarizer.py Stage 0: Generates project context from README
src/sqrl/ipc/handlers.py Orchestrates the pipeline
specs/PROMPTS.md Prompt specifications (PROMPT-001, 002, 003)

Current Logic

Stage 0: ProjectSummarizer (cached)
    └── Reads README.md → 2-3 sentence summary
    
Stage 1: UserScanner (Flash)
    └── Input: all user messages
    └── Output: indices of messages with patterns (corrections, preferences, complaints)
    
Stage 2: MemoryExtractor (Pro) - runs per flagged message
    └── Input: trigger message + 3 AI turns context + project summary
    └── Output: user_styles[] + project_memories[]
    └── Filter: confidence > 0.8

Models

  • Scanner: google/gemini-3-flash-preview (SQRL_FAST_MODEL)
  • Extractor/Summarizer: google/gemini-3-pro-preview (SQRL_STRONG_MODEL)

Test Data

  • test_data/ - sample episodes
  • python -m sqrl extract <episode.json> - CLI for testing

What to Improve

  • Prompt quality (reduce false positives)
  • Extraction accuracy
  • Skip criteria refinement

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions