Centralized logging and search system for AI coding assistant conversations (Claude Code, Codex, Gemini CLI, etc.) across multiple machines.
Three components:
- Collector (Python, runs on each dev machine) — watches
~/.claude/projects/,~/.codex/sessions/, etc. for new conversation files, copies them incrementally to an outbox, and rsync syncs them to the server's inbox. - Processor (Python, runs on server in Docker) — reads files from the inbox, parses them into canonical messages, indexes into Typesense, and archives processed files.
- UI (Next.js, runs on server in Docker) — web interface for browsing and searching conversations.
- Source code:
src/session_siphon/(collector, processor, models, config) - Parsers:
src/session_siphon/processor/parsers/— one per source (claude_code, codex, gemini, opencode, etc.) - UI:
ui/src/app/— Next.js app with server components for conversation detail and client components for list/search - Typesense lib:
ui/src/lib/typesense.ts(server-side),ui/src/lib/api.ts(client-side) - Tests:
tests/— pytest, run withpython3 -m pytest - Scripts:
scripts/— deployment, backfill, and schema migration utilities
Dev machine: ~/.claude/projects/**/*.jsonl
→ Collector copies to outbox
→ rsync to server inbox (/data/session-siphon/inbox/<machine_id>/claude_code/...)
→ Processor parses, indexes to Typesense, archives to /data/session-siphon/archive/
Sessions can have relationships:
- fork: created via
claude --fork; hasforkedFrom.sessionIdin the JSONL - continuation: session that ran out of context and was continued; detected by
compact_boundaryentries with a different sessionId - subagent: task delegated by a parent session; stored at
<parent-uuid>/subagents/<agent-id>.jsonl
Relationships are stored as parent_conversation_id and relationship_type on the conversation document in Typesense.
bash scripts/deploy.sh all # Deploy to server + all clients
bash scripts/deploy.sh server # Server only (rebuilds Docker containers)
bash scripts/deploy.sh clients # All client machines only
bash scripts/deploy.sh status # Check status everywhereThe deploy script auto-detects if a client machine is the local host and deploys locally (pip install + systemctl restart) instead of via SSH.
Server (ubuntu@nathan-server): rsync source → docker compose build && docker compose up -d
Clients (nathan@office-desktop, nathan@p16): rsync source → pip install → restart siphon-collector.service
If parser/indexer changes affect how data is stored, you may need to:
-
Backfill relationships (updates conversation metadata without reprocessing):
cat scripts/backfill_relationships.py | ssh ubuntu@nathan-server \ "docker exec -i -e TYPESENSE_HOST=typesense -e TYPESENSE_PORT=8108 \ -e DATA_PATH=/data/session-siphon session-siphon-processor-1 python3 -"
-
Full reindex (nuclear option — reprocesses everything):
# Delete processor state DB so it re-reads all files from offset 0 ssh ubuntu@nathan-server "rm ~/docker/session-siphon/data/session-siphon/state/processor.db" # Move archived files back to inbox ssh ubuntu@nathan-server "cd ~/docker/session-siphon/data/session-siphon && cp -r archive/*/* inbox/" # Restart processor ssh ubuntu@nathan-server "cd ~/docker/session-siphon && docker compose restart processor"
-
Schema changes (add new fields to Typesense collections):
cat scripts/update_schema.py | ssh ubuntu@nathan-server \ "docker exec -i -e TYPESENSE_HOST=typesense session-siphon-processor-1 python3 -"
# Run tests
python3 -m pytest
# Run a specific test file
python3 -m pytest tests/test_processor_parsers_claude_code.py -x
# Dev UI (requires Typesense running)
cd ui && npm install && npm run dev- Create
src/session_siphon/processor/parsers/<source>.pyimplementing theParserbase class - Register it in
src/session_siphon/processor/parsers/__init__.py - Add source discovery in
src/session_siphon/collector/sources.py - Add tests in
tests/test_processor_parsers_<source>.py
CanonicalMessage(models.py): normalized message with source, machine_id, project, conversation_id, ts, role, contentConversation(models.py): aggregated metadata (title from first user message, preview from last message, timestamps, message count, relationships)- Message/Conversation IDs in Typesense:
source:machine_id:conversation_id[:timestamp:content_hash]
- Claude Code subagent files use the parent's sessionId internally but have a different filename (e.g.,
agent-a77d39e91537bc64f). Theconversation_idcomes from the filename stem, not the sessionId field. - The collector glob
**/*.jsonlmatches subagent files too — this is intentional so they get indexed as separate conversations. scripts/deploy.shis gitignored (contains machine-specific config). If it's missing, copy from another machine or recreate from the template in this file.- The processor container doesn't have
pippackages pre-installed for scripts. Pipe scripts viadocker exec -i ... python3 -or install typesense first withdocker exec ... pip install typesense.