Skip to content

feat: code-level filter for user-only knowledge extraction #9

@brgsk

Description

@brgsk

Context

From #6 — when using smaller/local models (e.g. Qwen), assistant messages can leak into extracted knowledge despite prompt-level instructions to only extract from user messages.

Problem

The extraction prompts instruct the LLM to only extract from >>> USER: lines and ignore ASSISTANT: lines. This works well with capable models but is unreliable with smaller or local models that don't follow instructions as strictly.

Proposal

Add a code-level filter in the extraction pipeline that strips or validates extractions against the actual user message content, so extraction correctness doesn't depend entirely on model instruction-following capability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions