A persistent conversational AI agent with long-term memory, dynamic identity, proximity awareness, and real-time streaming responses — powered by Google Gemini API.
- 🔄 Real-Time Streaming — Typewriter-style character-by-character response display via Gemini streaming API
- 🧠 Bifurcated Memory System — Dual-layer memory with episodic (SQLite FTS5) and semantic (FAISS) recall
- 📝 Auto-Summarization — Every 5 turns are compressed into a single memory sentence and indexed into long-term storage
- 🎭 Dynamic Lore Retrieval — Personality and knowledge chunks retrieved via semantic search, not static injection
- 📍 Proximity Detection — Nomic embedding-based detection of physical/remote/transitional presence states
- 🕐 Temporal Awareness — Tracks time between conversations and adjusts context accordingly
- 🛡️ Traffic Control — Hold-Wait-Commit pattern ensures only valid responses are logged
- 💾 Response Caching — Deduplicates API calls with local hash-based cache
- 🔧 Memory Management CLI — List, delete, clear, and rebuild memory from the command line
project/
├── main.py # Entry point — CLI loop, streaming, 5-turn cycles
├── model_config.py # LLM configuration (model, API, generation params)
├── setup.py # Initialize project structure and default files
├── manage_memory.py # Memory management CLI tool
├── check_models.py # List available models for your API key
├── requirements.txt
│
├── agent/ # State & Retrieval Layer
│ ├── temporal.py # Time delta calculation
│ ├── memory.py # SQLite FTS5 episodic storage
│ ├── semantic_search.py # FAISS vector search (nomic-embed-text)
│ ├── conversation.py # Session logging + buffer management
│ ├── dynamic_lore.py # Semantic lore retrieval
│ ├── lore/ # Static personality files
│ │ ├── self.md # AI identity
│ │ ├── user.md # User profile
│ │ └── relationship.md # Connection definition
│ └── episodes/ # Raw memory source files
│
├── pipeline/ # Prompt Construction & Rendering
│ ├── packet_builder.py # XML-tagged prompt assembly
│ ├── renderer.py # Gemini API (non-streaming, cached)
│ └── summarizer_builder.py # 5-turn summarization pipeline
│
├── streaming/ # Real-Time Response
│ └── renderer_streaming.py # Streaming with typewriter effect
│
├── proximity/ # Presence Detection
│ └── proximity_manager.py # Nomic-based proximity state engine
│
├── memory/ # Memory Intent & Retrieval
│ └── memory_loader.py # Intent detection + multi-source fetching
│
├── tools/ # Utilities
│ └── index_lore.py # Rebuild lore index
│
└── data/
├── nomic-embed-text-v1.5.Q8_0.gguf # Local embedding model
└── logs_raw/ # Session conversation logs
User Input → HOLD (temporary, not logged)
↓
Build Packet (XML-tagged prompt)
├── Dynamic Lore → Semantic search over personality chunks
├── Proximity State → Inject if changed (embedding similarity)
├── Memory Bank → Fetch if memory intent detected
└── Chat History → Last 6 turns
↓
Stream to Gemini API (gemma-3-12b-it)
↓
Typewriter Display → Clean [AI]: prefix → Validate
↓
Valid? → COMMIT both messages to log + buffer
Invalid? → DISCARD (clean retry, no history pollution)
↓
Turn == 5? → Summarize → Index to brain.db + FAISS → Reset buffer
- Python 3.10+
- Google AI Studio API Key (free tier supports Gemma models)
- ~200MB disk space (for the Nomic embedding model)
# Clone the repository
git clone https://github.com/optimist1101jan/Persistent-AI-Systems-.git
# Navigate to the project directory
cd Persistent-AI-Systems-
# Install dependencies
pip install -r requirements.txt
# Add your API key (Creates API_KEY.txt)
echo "API_KEY=your_gemini_api_key_here" > API_KEY.txt
# Initialize database and project structure
python setup.py
# Start the agent
python main.pyDownload the nomic-embed-text-v1.5 GGUF model and place it in data/:
# Place the file at:
data/nomic-embed-text-v1.5.Q8_0.ggufNote: The agent works without the embedding model (using fallback), but semantic search and proximity detection require it.
All model settings are in model_config.py:
| Parameter | Default | Description |
|---|---|---|
MODEL |
gemma-3-12b-it |
Gemini/Gemma model to use |
TEMPERATURE |
0.7 |
Response creativity |
MAX_OUTPUT_TOKENS |
1000 |
Max response length |
TIMEOUT |
60s |
API request timeout |
MAX_RETRIES |
3 |
Retry attempts on failure |
Free Tier (Gemma): Paid Tier (Gemini):
gemma-3-1b-it (fastest) gemini-2.0-flash
gemma-3-4b-it (balanced) gemini-2.0-flash-lite
gemma-3-12b-it (recommended) gemini-2.5-flash
gemma-3-27b-it (best quality) gemini-2.5-pro
Run python check_models.py to see all models available for your API key.
The agent uses a 3-stage memory pipeline:
Raw conversation turns held in-memory for the current 5-turn cycle.
After 5 turns, the buffer is sent to Gemini for compression into a single factual sentence.
The compressed memory is simultaneously indexed into:
- Episodic Store (SQLite FTS5) — keyword searchable
- Semantic Index (FAISS) — embedding-based similarity search
When the user asks a memory-related question (e.g., "do you remember..."), the system:
- Detects memory intent via keyword patterns
- Searches both episodic and semantic stores
- Injects relevant memories into the prompt
python manage_memory.py list # View all stored memories
python manage_memory.py delete <id> # Delete a specific memory
python manage_memory.py stats # Show memory statistics
python manage_memory.py rebuild # Rebuild FAISS index
python manage_memory.py clear # Clear all memoriesThe agent detects physical presence context using embedding similarity:
| State | Description | Example Input |
|---|---|---|
PHYSICAL |
User is present, face-to-face | "sits next to you" |
REMOTE |
Chatting remotely | "texting from work" |
TRANSITION_TOWARD |
User arriving | "walks over to you" |
TRANSITION_AWAY |
User leaving | "I need to go now" |
Proximity context is only injected when the state changes or on the first turn, saving tokens.
| Component | Technology |
|---|---|
| LLM | Google Gemini API (Gemma 3 12B) |
| Embeddings | nomic-embed-text-v1.5 (GGUF, 768-dim) |
| Vector Search | FAISS (IndexFlatIP, cosine similarity) |
| Episodic Memory | SQLite FTS5 |
| Embedding Runtime | llama-cpp-python |
| Streaming | Gemini streamGenerateContent API |
| Language | Python 3.10+ |
- Never commit
API_KEY.txt— add it to.gitignore - The API key is loaded at runtime from a local file
- No external data is stored beyond local cache and logs
This project is for educational and personal use. See individual dependency licenses for third-party components.