- [01/08/2026] We've set up a Discord server and WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! π₯ Join our Discord and WeChat Group Now!
- [01/05/2026] SimpleMem paper was released on arXiv!
- π Overview
- π― Key Contributions
- π Performance Highlights
- π¦ Installation
- β‘ Quick Start
- π Evaluation
- π File Structure
- π Citation
- π License
- π Acknowledgments
SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.
SimpleMem addresses the fundamental challenge of efficient long-term memory for LLM agents through a three-stage pipeline grounded in Semantic Lossless Compression. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes information density and token utilization through:
|
Semantic Structured Compression Entropy-based filtering and de-linearization of dialogue into self-contained atomic facts |
Structured Indexing Asynchronous evolution from fragmented atoms to higher-order molecular insights |
Adaptive Retrieval Complexity-aware pruning across semantic, lexical, and symbolic layers |
The SimpleMem Architecture: A three-stage pipeline for efficient lifelong memory through semantic lossless compression
Speed Comparison Demo
SimpleMem vs. Baseline: Real-time speed comparison demonstration
LoCoMo-10 Benchmark Results (GPT-4.1-mini)
| Model | β±οΈ Construction Time | π Retrieval Time | β‘ Total Time | π― Average F1 |
|---|---|---|---|---|
| A-Mem | 5140.5s | 796.7s | 5937.2s | 32.58% |
| LightMem | 97.8s | 577.1s | 675.9s | 24.63% |
| Mem0 | 1350.9s | 583.4s | 1934.3s | 34.20% |
| SimpleMem β | 92.6s | 388.3s | 480.9s | 43.24% |
π‘ Key Advantages:
- π Highest F1 Score: 43.24% (+26.4% vs. Mem0, +75.6% vs. LightMem)
- β‘ Fastest Retrieval: 388.3s (32.7% faster than LightMem, 51.3% faster than Mem0)
- π Fastest End-to-End: 480.9s total processing time (12.5Γ faster than A-Mem)
SimpleMem transforms raw, ambiguous dialogue streams into atomic entries β self-contained facts with resolved coreferences and absolute timestamps. This write-time disambiguation eliminates downstream reasoning overhead.
β¨ Example Transformation:
- Input: "He'll meet Bob tomorrow at 2pm" [β relative, ambiguous]
+ Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00" [β
absolute, atomic]Memory is indexed across three structured dimensions for robust, multi-granular retrieval:
| π Layer | π Type | π― Purpose | π οΈ Implementation |
|---|---|---|---|
| Semantic | Dense | Conceptual similarity | Vector embeddings (1024-d) |
| Lexical | Sparse | Exact term matching | BM25-style keyword index |
| Symbolic | Metadata | Structured filtering | Timestamps, entities, persons |
Instead of fixed-depth retrieval, SimpleMem dynamically estimates query complexity (
|
πΉ Low Complexity Queries
|
πΈ High Complexity Queries
|
π Result: 43.24% F1 score with 30Γ fewer tokens than full-context methods.
π¬ High-Capability Models (GPT-4.1-mini)
| Task Type | SimpleMem F1 | Mem0 F1 | Improvement |
|---|---|---|---|
| MultiHop | 43.46% | 30.14% | +43.8% |
| Temporal | 58.62% | 48.91% | +19.9% |
| SingleHop | 51.12% | 41.3% | +23.8% |
βοΈ Efficient Models (Qwen2.5-1.5B)
| Metric | SimpleMem | Mem0 | Notes |
|---|---|---|---|
| Average F1 | 25.23% | 23.77% | Competitive with 99Γ smaller model |
- π Python 3.10
- π OpenAI-compatible API (OpenAI, Qwen, Azure OpenAI, etc.)
# π₯ Clone repository
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem
# π¦ Install dependencies
pip install -r requirements.txt
# βοΈ Configure API settings
cp config.py.example config.py
# Edit config.py with your API key and preferences# config.py
OPENAI_API_KEY = "your-api-key"
OPENAI_BASE_URL = None # or custom endpoint for Qwen/Azure
LLM_MODEL = "gpt-4.1-mini"
EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B" # State-of-the-art retrievalfrom main import SimpleMemSystem
# π Initialize system
system = SimpleMemSystem(clear_db=True)
# π¬ Add dialogues (Stage 1: Semantic Structured Compression)
system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00")
system.add_dialogue("Bob", "Sure, I'll bring the market analysis report", "2025-11-15T14:31:00")
# β
Finalize atomic encoding
system.finalize()
# π Query with adaptive retrieval (Stage 3: Adaptive Query-Aware Retrieval)
answer = system.ask("When and where will Alice and Bob meet?")
print(answer)
# Output: "16 November 2025 at 2:00 PM at Starbucks"For large-scale dialogue processing, enable parallel mode:
system = SimpleMemSystem(
clear_db=True,
enable_parallel_processing=True, # β‘ Parallel memory building
max_parallel_workers=8,
enable_parallel_retrieval=True, # π Parallel query execution
max_retrieval_workers=4
)π‘ Pro Tip: Parallel processing significantly reduces latency for batch operations!
# π― Full LoCoMo benchmark
python test_locomo10.py
# π Subset evaluation (5 samples)
python test_locomo10.py --num-samples 5
# πΎ Custom output file
python test_locomo10.py --result-file my_results.jsonUse the exact configurations in config.py:
- π High-capability: GPT-4.1-mini, Qwen3-Plus
- βοΈ Efficient: Qwen2.5-1.5B, Qwen2.5-3B
- π Embedding: Qwen3-Embedding-0.6B (1024-d)
If you use SimpleMem in your research, please cite:
@article{simplemem2025,
title={SimpleMem: Efficient Lifelong Memory for LLM Agents},
author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
journal={arXiv preprint arXiv:2601.02553},
year={2025},
url={https://github.com/aiming-lab/SimpleMem}
}This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank the following projects and teams:
- π Embedding Model: Qwen3-Embedding - State-of-the-art retrieval performance
- ποΈ Vector Database: LanceDB - High-performance columnar storage
- π Benchmark: LoCoMo - Long-context memory evaluation framework
