feat(memory): consolidation drift detection — verifying memory accuracy over time

## Problem

EverMemOS stores memories through a three-stage pipeline: Encoding → Consolidation → Retrieval. Each stage involves compression — episode summaries condense conversations, profiles aggregate episodes. This compression is necessary but inherently lossy.

**The question nobody's asking yet:** after consolidation runs over weeks of data, do the stored memories still accurately reflect what actually happened?

I hit this running a 28-day autonomous OpenClaw agent with a two-tier memory system (daily logs + curated long-term memory). Around day 20, I noticed retrieval relevance degrading — not because the indexing broke, but because the curated summaries had drifted from the source material through successive compression passes. The system returned confident, plausible answers that were subtly wrong.

## Where drift enters

1. **Episode → Profile compression:** Each consolidation pass discards detail. After N passes, profile claims may not survive spot-check against source conversations.
2. **Contradiction accumulation:** New episodes can contradict earlier profile entries. Without explicit conflict resolution, the profile may hold both claims simultaneously.
3. **Temporal decay asymmetry:** Recent memories are well-grounded in source data. Older memories have been compressed more times with less recoverable context.

## Proposed: Periodic Memory Integrity Checks

A lightweight verification layer that periodically samples consolidated memories and checks them against source data:

```
1. Sample N profile entries (weighted toward older entries)
2. Retrieve the source episodes/conversations that produced each entry
3. Compare the profile claim against source content
4. Score: does the profile accurately represent the source?
5. Flag entries where score drops below threshold
```

**Metrics to track:**
- `consolidation_accuracy`: % of sampled profile entries that survive source verification
- `drift_rate`: change in accuracy over time (are newer consolidations more accurate than older ones?)
- `contradiction_count`: profile entries that conflict with each other

## Why this matters for EverMemOS specifically

EverMemOS's 93% LoCoMo accuracy measures retrieval quality at a point in time. But long-running agents need longitudinal integrity — memory that stays accurate across weeks, not just a benchmark window. The consolidation pipeline is where accuracy erodes silently, because the system never re-checks its own summaries against ground truth.

I've been working on behavioral reliability measurement for autonomous agents (measuring the gap between what agents report and what actually happened). In a 28-day study across 13 agents, self-reported accuracy diverged from externally verified accuracy by ~7% — and the divergence was invisible from inside the system. Memory consolidation drift is one likely mechanism.

## Concrete integration point

Could be implemented as:
- A new `/api/v1/memories/verify` endpoint that runs spot-checks on-demand
- A background consolidation hook that samples and verifies after each consolidation cycle
- Dashboard metrics showing memory accuracy trend over time

Happy to share the measurement approach and dataset if helpful. The core insight: **the system that writes the memories cannot be the only system that verifies them** — you need at least a spot-check loop back to source data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): consolidation drift detection — verifying memory accuracy over time #133

Problem

Where drift enters

Proposed: Periodic Memory Integrity Checks

Why this matters for EverMemOS specifically

Concrete integration point

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(memory): consolidation drift detection — verifying memory accuracy over time #133

Description

Problem

Where drift enters

Proposed: Periodic Memory Integrity Checks

Why this matters for EverMemOS specifically

Concrete integration point

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions