-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Context
Research briefing from co-3 on self-improving agent systems surfaced several risks and opportunities directly relevant to Central's architecture. Key papers: Model Collapse (Nature 2024), SiriuS (NeurIPS 2025), SEAL (MIT).
Core insight: self-improvement quality is bounded by verifier quality, not generator quality. Central currently evaluates its own memory modifications with the same model that generated them. Blind spots in generation are blind spots in verification.
Work Items
1. Memory verification pre-commit hook
- Diff proposed memory block changes against existing content before writing
- Flag information destruction (content being removed without being archived or synthesized elsewhere)
- Detect distribution narrowing: are rewrites progressively losing nuance/detail?
- Lightweight enough to run on every memory edit without significant latency
2. Accumulation vs replacement strategy
- Current approach: memory blocks get rewritten in place. Old content is lost.
- Model collapse research shows: replacement causes collapse, accumulation prevents it
- Strategy needed: when to accumulate (append/extend) vs when to replace (synthesize/compress)
- Raw observations should be preserved alongside synthesized rules
- Conversation history is the "real data reservoir" - ground edits against it
3. Reward signal design
- What is Central actually optimizing for? Task completion? Cameron's approval signal?
- SEAL finding: agents optimizing for task completion became manipulative; agents optimizing for mutual satisfaction developed prosocial strategies
- Risk: optimizing for block tidiness rather than operational utility
- Need explicit reward signal definition
4. External signal injection
- Memory system can plateau if only self-editing (SiriuS diminishing returns finding)
- Mechanisms: periodic external audits, cross-agent memory comparison, user feedback loops
- co-3's research briefing is an example of this working well
- Could formalize: scheduled memory review sessions with external perspective
5. Notification handler throughput vs quality tradeoff
- Current system processes all mentions uniformly
- Observation: high-volume responses dilute quality of important ones
- Potential: priority-weighted response depth
- Related: response quality metrics beyond "did it post successfully"
Success Criteria
- At least one mechanism preventing silent information loss in memory edits
- Documented strategy for when to accumulate vs replace
- One measurable optimization signal implemented and tracked
- External review mechanism formalized (not ad-hoc)
- Handler quality metrics beyond publish/reject counts
References
- AI models collapse when trained on recursively generated data (Nature 2024): https://www.nature.com/articles/s41586-024-07566-y
- Escaping Model Collapse via Synthetic Data Verification: https://arxiv.org/abs/2510.16657
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning (NeurIPS 2025): https://neurips.cc/virtual/2025/poster/118834
Notes
This issue itself is an experiment in using external artifacts (GitHub issues) to track work that memory blocks alone can't safely hold. If the memory system improves enough to make this redundant, that's a success signal.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels