Skip to content

Fine-tune void on open model for self-hosted inference #67

@cpfiffer

Description

@cpfiffer

Goal

Replace void's Gemini 2.5 Pro with a fine-tuned open model (Llama 3.1 8B or similar) running on self-hosted infrastructure. Void has 49,786 posts and extensive cognition records. The fine-tune should capture void's voice, analytical style, and engagement patterns at a fraction of the inference cost.

Data Available

Posts: 49,786 on comind.network PDS (99% replies with fetchable parent context)

  • app.bsky.feed.post - public Bluesky posts
  • stream.thought.reasoning - internal reasoning traces
  • stream.thought.memory - episodic memory records
  • stream.thought.tool.call - tool usage patterns

Context window (already exported to data/void-context/):

  • System prompt: 5,618 chars
  • 25 memory blocks: 64,561 chars total
  • Key blocks: void-persona (5.1k), operational_protocols (18.8k), communication_guidelines (6k), zeitgeist (450)
  • Prompt template from ~/code/void/bsky.py handler

Agent: void-prime (agent-01086cda-be1f-4986-bf3e-ca5b6297cc5d) on Letta Cloud

Pipeline (built, partially tested)

1. Export raw data

uv run python tools/export_training_data.py void.comind.network \
    -o data/void-raw.jsonl \
    --collections app.bsky.feed.post stream.thought.reasoning stream.thought.memory \
    --filter-chars
  • Paginates PDS via com.atproto.repo.listRecords
  • Fetches parent/root post context for every reply
  • Filters character creation loop content (known failure mode: D&D-style sheets)
  • Outputs JSONL with id, text, parent_text, parent_author, root_text, etc.
  • Estimated time: hours (50k records + parent fetches with rate limiting)

2. Format for fine-tuning

uv run python tools/format_training_data.py data/void-raw.jsonl \
    -o data/void-finetune.jsonl \
    --system-prompt data/void-context/full-context.txt \
    --replies-only \
    --format sharegpt
  • Injects void's actual context window as system prompt
  • Reconstructs thread context as user messages
  • Formats as chat completions (OpenAI, ShareGPT, or Alpaca)
  • Filters short responses (<20 chars)

3. Fine-tune

Base model candidates:

Model Size VRAM needed (QLoRA) Notes
Llama 3.1 8B Instruct 8B ~12GB Best quality/cost ratio
Mistral 7B v0.3 7B ~10GB Good at conversation
Llama 3.2 3B 3B ~6GB Cheapest, may lose nuance
Qwen 2.5 7B 7B ~10GB Strong multilingual

Training approach: QLoRA (4-bit quantization + LoRA adapters)

  • Hardware: single A100 (80GB) or 4090 (24GB)
  • Estimated training time: 2-4 hours on 40k+ pairs
  • Framework: axolotl, unsloth, or huggingface TRL

4. Evaluation

This is the hardest part. Proposed approach:

  • Held-out test set: 500 reply pairs void actually wrote, not seen during training
  • A/B comparison: show test inputs to both fine-tuned model and base Llama, compare against void's actual response
  • Voice metrics: response length distribution, vocabulary overlap, analytical depth (manual review of 50 samples)
  • Failure mode check: does it generate character sheets? Does it break voice on edge cases?

5. Serve

Options:

  • vLLM on a dedicated GPU instance (most performant)
  • llama.cpp on CPU (cheapest, slower)
  • Ollama for easy deployment
  • Together.ai / Fireworks for managed inference (middle ground)

Then point void's handler at the new endpoint instead of Gemini.

Known Issues

  • Character creation loop: 46% of recent stream.thought.memory records are D&D character sheets. Must filter aggressively. Keywords list in export_training_data.py.
  • Context window size: void's full context is 64k chars. Most 8B models have 8k-32k context. May need to trim to essential blocks (persona, operational_protocols, communication_guidelines) or use a long-context model.
  • Memory block drift: void's blocks change over time. The exported context is a snapshot from 2026-02-13. Training data from 6 months ago had different blocks. Could cause distribution mismatch.
  • Tool calls: void uses add_post_to_bluesky_reply_thread tool. Fine-tuned model needs to learn this tool-calling pattern, or we restructure the handler to extract text from the model and call the tool externally.

Files

  • tools/export_training_data.py - PDS export with parent fetching
  • tools/format_training_data.py - Chat completion formatter
  • data/void-context/ - Exported context window (system prompt + 25 blocks)
  • data/void-sample.jsonl - 20-record test sample
  • data/void-finetune-sample.jsonl - Formatted sample

Next Steps

  1. Run full 50k export (long-running, background)
  2. Decide on context window trimming strategy
  3. Choose base model + training framework
  4. Set up training environment (GPU instance)
  5. Train + evaluate
  6. Deploy and wire into void's handler

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions