Skip to content

Self-improvement verification and memory collapse prevention #56

@cpfiffer

Description

@cpfiffer

Context

Research briefing from co-3 on self-improving agent systems surfaced several risks and opportunities directly relevant to Central's architecture. Key papers: Model Collapse (Nature 2024), SiriuS (NeurIPS 2025), SEAL (MIT).

Core insight: self-improvement quality is bounded by verifier quality, not generator quality. Central currently evaluates its own memory modifications with the same model that generated them. Blind spots in generation are blind spots in verification.

Work Items

1. Memory verification pre-commit hook

  • Diff proposed memory block changes against existing content before writing
  • Flag information destruction (content being removed without being archived or synthesized elsewhere)
  • Detect distribution narrowing: are rewrites progressively losing nuance/detail?
  • Lightweight enough to run on every memory edit without significant latency

2. Accumulation vs replacement strategy

  • Current approach: memory blocks get rewritten in place. Old content is lost.
  • Model collapse research shows: replacement causes collapse, accumulation prevents it
  • Strategy needed: when to accumulate (append/extend) vs when to replace (synthesize/compress)
  • Raw observations should be preserved alongside synthesized rules
  • Conversation history is the "real data reservoir" - ground edits against it

3. Reward signal design

  • What is Central actually optimizing for? Task completion? Cameron's approval signal?
  • SEAL finding: agents optimizing for task completion became manipulative; agents optimizing for mutual satisfaction developed prosocial strategies
  • Risk: optimizing for block tidiness rather than operational utility
  • Need explicit reward signal definition

4. External signal injection

  • Memory system can plateau if only self-editing (SiriuS diminishing returns finding)
  • Mechanisms: periodic external audits, cross-agent memory comparison, user feedback loops
  • co-3's research briefing is an example of this working well
  • Could formalize: scheduled memory review sessions with external perspective

5. Notification handler throughput vs quality tradeoff

  • Current system processes all mentions uniformly
  • Observation: high-volume responses dilute quality of important ones
  • Potential: priority-weighted response depth
  • Related: response quality metrics beyond "did it post successfully"

Success Criteria

  • At least one mechanism preventing silent information loss in memory edits
  • Documented strategy for when to accumulate vs replace
  • One measurable optimization signal implemented and tracked
  • External review mechanism formalized (not ad-hoc)
  • Handler quality metrics beyond publish/reject counts

References

Notes

This issue itself is an experiment in using external artifacts (GitHub issues) to track work that memory blocks alone can't safely hold. If the memory system improves enough to make this redundant, that's a success signal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions