Session Siphon

Centralized logging and search system for AI coding assistant conversations (Claude Code, Codex, Gemini CLI, etc.) across multiple machines.

Architecture

Three components:

Collector (Python, runs on each dev machine) — watches ~/.claude/projects/, ~/.codex/sessions/, etc. for new conversation files, copies them incrementally to an outbox, and rsync syncs them to the server's inbox.
Processor (Python, runs on server in Docker) — reads files from the inbox, parses them into canonical messages, indexes into Typesense, and archives processed files.
UI (Next.js, runs on server in Docker) — web interface for browsing and searching conversations.

Key paths

Source code: src/session_siphon/ (collector, processor, models, config)
Parsers: src/session_siphon/processor/parsers/ — one per source (claude_code, codex, gemini, opencode, etc.)
UI: ui/src/app/ — Next.js app with server components for conversation detail and client components for list/search
Typesense lib: ui/src/lib/typesense.ts (server-side), ui/src/lib/api.ts (client-side)
Tests: tests/ — pytest, run with python3 -m pytest
Scripts: scripts/ — deployment, backfill, and schema migration utilities

Data flow

Dev machine: ~/.claude/projects/**/*.jsonl
  → Collector copies to outbox
  → rsync to server inbox (/data/session-siphon/inbox/<machine_id>/claude_code/...)
  → Processor parses, indexes to Typesense, archives to /data/session-siphon/archive/

Conversation relationships

Sessions can have relationships:

fork: created via claude --fork; has forkedFrom.sessionId in the JSONL
continuation: session that ran out of context and was continued; detected by compact_boundary entries with a different sessionId
subagent: task delegated by a parent session; stored at <parent-uuid>/subagents/<agent-id>.jsonl

Relationships are stored as parent_conversation_id and relationship_type on the conversation document in Typesense.

Deployment

Deploy script

bash scripts/deploy.sh all      # Deploy to server + all clients
bash scripts/deploy.sh server   # Server only (rebuilds Docker containers)
bash scripts/deploy.sh clients  # All client machines only
bash scripts/deploy.sh status   # Check status everywhere

The deploy script auto-detects if a client machine is the local host and deploys locally (pip install + systemctl restart) instead of via SSH.

Server (ubuntu@nathan-server): rsync source → docker compose build && docker compose up -d Clients (nathan@office-desktop, nathan@p16): rsync source → pip install → restart siphon-collector.service

After deploying

If parser/indexer changes affect how data is stored, you may need to:

Backfill relationships (updates conversation metadata without reprocessing):

cat scripts/backfill_relationships.py | ssh ubuntu@nathan-server \
  "docker exec -i -e TYPESENSE_HOST=typesense -e TYPESENSE_PORT=8108 \
   -e DATA_PATH=/data/session-siphon session-siphon-processor-1 python3 -"

Full reindex (nuclear option — reprocesses everything):

# Delete processor state DB so it re-reads all files from offset 0
ssh ubuntu@nathan-server "rm ~/docker/session-siphon/data/session-siphon/state/processor.db"
# Move archived files back to inbox
ssh ubuntu@nathan-server "cd ~/docker/session-siphon/data/session-siphon && cp -r archive/*/* inbox/"
# Restart processor
ssh ubuntu@nathan-server "cd ~/docker/session-siphon && docker compose restart processor"

Schema changes (add new fields to Typesense collections):

cat scripts/update_schema.py | ssh ubuntu@nathan-server \
  "docker exec -i -e TYPESENSE_HOST=typesense session-siphon-processor-1 python3 -"

Development

# Run tests
python3 -m pytest

# Run a specific test file
python3 -m pytest tests/test_processor_parsers_claude_code.py -x

# Dev UI (requires Typesense running)
cd ui && npm install && npm run dev

Adding a new parser

Create src/session_siphon/processor/parsers/<source>.py implementing the Parser base class
Register it in src/session_siphon/processor/parsers/__init__.py
Add source discovery in src/session_siphon/collector/sources.py
Add tests in tests/test_processor_parsers_<source>.py

Key types

CanonicalMessage (models.py): normalized message with source, machine_id, project, conversation_id, ts, role, content
Conversation (models.py): aggregated metadata (title from first user message, preview from last message, timestamps, message count, relationships)
Message/Conversation IDs in Typesense: source:machine_id:conversation_id[:timestamp:content_hash]

Gotchas

Claude Code subagent files use the parent's sessionId internally but have a different filename (e.g., agent-a77d39e91537bc64f). The conversation_id comes from the filename stem, not the sessionId field.
The collector glob **/*.jsonl matches subagent files too — this is intentional so they get indexed as separate conversations.
scripts/deploy.sh is gitignored (contains machine-specific config). If it's missing, copy from another machine or recreate from the template in this file.
The processor container doesn't have pip packages pre-installed for scripts. Pipe scripts via docker exec -i ... python3 - or install typesense first with docker exec ... pip install typesense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session Siphon

Architecture

Key paths

Data flow

Conversation relationships

Deployment

Deploy script

After deploying

Development

Adding a new parser

Key types

Gotchas

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Session Siphon

Architecture

Key paths

Data flow

Conversation relationships

Deployment

Deploy script

After deploying

Development

Adding a new parser

Key types

Gotchas