Skip to content

Direct RLHF Feedback Loops for Local Model Fine-Tuning #57

@galic1987

Description

@galic1987

Summary

Generate preference datasets from user interactions that can be used to fine-tune local models via open-source RLHF pipelines.

Problem

When users correct agent mistakes via the CLI, this valuable signal is lost. There's no mechanism to capture chosen vs rejected actions in a format that enables local model improvement over time.

Proposal

Log success/failure trajectories in a format natively digestible by open-source RLHF pipelines (like OpenRLHF, TRL, or Axolotl):

  • Preference dataset generation: When a user corrects an agent action (via /undo, manual file edit, or explicit "that's wrong"), capture the (prompt, chosen_response, rejected_response) triple
  • DPO/RLHF-ready format: Output in standard formats (JSON, Parquet) compatible with Hugging Face datasets
  • Coding style adaptation: Over time, the preference data captures the user's specific coding style, naming conventions, and architectural preferences
  • Local fine-tuning pipeline: Provide a selfware fine-tune command that runs LoRA/QLoRA fine-tuning using collected preference data

Implementation Ideas

  • Hook into the existing audit logger to capture tool call sequences
  • Detect "correction events": user undoes an edit, re-runs with different instructions, or explicitly rejects output
  • Store preference pairs in ~/.selfware/feedback/preferences.jsonl
  • Format: {"prompt": "...", "chosen": "...", "rejected": "...", "metadata": {...}}
  • Integration with Unsloth for efficient local fine-tuning
  • Privacy-first: all data stays local, user controls what gets logged

Example Output

{"prompt": "Add error handling to the parse function", "chosen": "fn parse(input: &str) -> Result<Value, ParseError> { ... }", "rejected": "fn parse(input: &str) -> Value { input.parse().unwrap() }", "task": "error_handling", "timestamp": "2026-03-09T12:00:00Z"}

Relevant Code

  • src/safety/audit.rs — JSONL audit logging (similar pattern)
  • src/session/edit_history.rs — undo/redo tracking
  • src/cognitive/episodic.rs — learning from past sessions
  • src/self_healing/ — error learning

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions