Rayan — System Architecture

Component Overview

┌──────────────────────────────────────────────────────────────────────┐
│                          User's Browser                              │
│                   React 18 + Three.js + @react-three/fiber           │
│                        (Firebase Hosting)                            │
│                                                                      │
│  ┌─────────────┐   ┌──────────────────┐   ┌──────────────────────┐  │
│  │  3D Memory  │   │  Capture Panel   │   │   Voice Panel        │  │
│  │  Palace     │   │  (mic + screen   │   │   (Recall mode)      │  │
│  │  Three.js   │   │   stream)        │   │   audio playback     │  │
│  └──────┬──────┘   └────────┬─────────┘   └──────────┬───────────┘  │
│         │                  │                          │              │
│         └──────────────────┴──────────────────────────┘              │
│                            │  WebSocket /ws/{userId}                 │
└────────────────────────────┼─────────────────────────────────────────┘
                             │
┌────────────────────────────▼─────────────────────────────────────────┐
│                     Cloud Run — FastAPI + uvicorn                    │
│                        (session affinity)                            │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  WebSocket Handler  /ws/{userId}                             │    │
│  │  Routes: audio_chunk, video_frame, capture_start/stop,       │    │
│  │          recall_start/stop, screenshot_response              │    │
│  └────────┬───────────────────────────────┬─────────────────────┘    │
│           │                               │                          │
│  ┌────────▼─────────────────┐   ┌─────────▼──────────────────────┐  │
│  │  CaptureAgent            │   │  RecallAgent                   │  │
│  │                          │   │                                │  │
│  │  Model:                  │   │  Model:                        │  │
│  │  gemini-live-2.5-flash-  │   │  gemini-live-2.5-flash-        │  │
│  │  native-audio            │   │  native-audio                  │  │
│  │                          │   │                                │  │
│  │  enable_affective_       │   │  enable_affective_             │  │
│  │  dialog=True             │   │  dialog=True                   │  │
│  │                          │   │                                │  │
│  │  Tools:                  │   │  Tools:                        │  │
│  │  • capture_concept       │   │  • navigate_to_room            │  │
│  │  • create_artifact       │   │  • navigate_horizontal         │  │
│  │  • create_room           │   │  • navigate_to_map_view        │  │
│  │  • take_screenshot       │   │  • highlight_artifact          │  │
│  │  • edit_artifact         │   │  • create_artifact             │  │
│  │  • delete_artifact       │   │  • edit_artifact               │  │
│  │  • web_search            │   │  • delete_artifact             │  │
│  │  • navigate_to_room      │   │  • delete_room                 │  │
│  │  • end_session           │   │  • synthesize_room             │  │
│  │                          │   │  • web_search                  │  │
│  │  Dedup: cosine ≥0.90     │   │  • end_session                 │  │
│  │  (gemini-embedding-2-    │   │                                │  │
│  │   preview, per-session)  │   │                                │  │
│  └────────┬─────────────────┘   └─────────┬──────────────────────┘  │
│           │                               │                          │
│  ┌────────▼───────────────────────────────▼──────────────────────┐   │
│  │  Memory Architect  (gemini-2.5-flash)                         │   │
│  │  • Categorizes concept into existing room or suggests new one  │   │
│  │  • Assigns artifact type and visual                            │   │
│  │  • Generates embedding via Vertex AI text-embedding-005        │   │
│  │  • Writes artifact + embedding to Firestore                    │   │
│  └────────────────────────────────┬───────────────────────────────┘  │
│                                   │                                  │
│  ┌────────────────────────────────▼───────────────────────────────┐  │
│  │  Semantic Search  (recall grounding)                           │  │
│  │  • Embeds user query via Vertex AI text-embedding-005          │  │
│  │  • Cosine similarity scan across all stored embeddings         │  │
│  │  • Top-8 results injected into RecallAgent system prompt       │  │
│  │  • Re-runs on every room navigation and artifact highlight      │  │
│  └────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────┘
                             │
┌────────────────────────────▼─────────────────────────────────────────┐
│                         Google Cloud                                 │
│                                                                      │
│  ┌──────────────────────┐   ┌────────────────────────────────────┐   │
│  │  Cloud Firestore     │   │  Vertex AI                         │   │
│  │                      │   │                                    │   │
│  │  users/              │   │  text-embedding-005                │   │
│  │   {userId}/          │   │  768-dimensional embeddings        │   │
│  │    rooms/            │   │  Used for:                         │   │
│  │     {roomId}/        │   │  • Artifact storage (capture)      │   │
│  │      artifacts/      │   │  • Semantic search (recall)        │   │
│  │       {id}           │   │  • Dedup detection (capture)       │   │
│  │       .embedding[]   │   │                                    │   │
│  │       .summary       │   │  Vector Search index               │   │
│  │       .fullContent   │   │  768-dim, cosine, Tree-AH          │   │
│  │       .type          │   │  (Terraform-provisioned)           │   │
│  └──────────────────────┘   └────────────────────────────────────┘   │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  Cloud Storage  (rayan-media-{project})                      │    │
│  │  • JPEG screenshots captured during Capture sessions         │    │
│  │  • AI-generated mind map images (synthesize_room)            │    │
│  │  • Public URLs stored as sourceMediaUrl on artifacts         │    │
│  └──────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  Firebase Hosting  (frontend)                                │    │
│  │  Firebase Auth     (user identity)                           │    │
│  └──────────────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────────────┘

Data Flow — Capture Mode

Mic + Screen
     │
     ▼
CaptureAgent (Gemini Live)
     │
     ├── _user_has_spoken gate: all tools blocked until
     │   user speaks at least once (prevents eager tool
     │   calls during the opening greeting)
     │
     │ autonomous concept detection
     │ OR user says "save this"
     ▼
capture_concept / create_artifact tool call
     │
     ├── rate limit (selective=60s / balanced=30s / thorough=12s)
     ├── confidence ≥ 0.7
     ├── within-session dedup: cosine ≥ 0.90 → merge instead
     │
     ▼
Memory Architect (gemini-2.5-flash)
  • Selects or creates a room
  • Assigns artifact type + visual
     │
     ▼
text-embedding-005 (Vertex AI)
  • Generates 768-dim embedding
     │
     ├──► Firestore  (artifact + embedding stored)
     │
     └──► WebSocket → Browser
              capture_ack (buffered until Rayan speaks —
                           badge appears after spoken message)
              palace_update → artifact appears in 3D palace live

Data Flow — Recall Mode

User speaks
     │
     ▼
RecallAgent (Gemini Live)
  enable_affective_dialog=True
     │
     ▼
On session start / room nav / artifact highlight:
  update_context() → semantic_search()
     │
     ▼
text-embedding-005 (Vertex AI)
  Embed current context / artifact summary
     │
     ▼
Cosine similarity vs all stored embeddings (Firestore)
  Top-8 most relevant memories selected
     │
     ▼
send_client_content → injected into live conversation
  RecallAgent answers ONLY from these grounded memories
     │
     ▼
Audio response → WebSocket → Browser
  + optional tool calls (navigate, highlight, synthesize)

Ambient Audio Behaviour

State	Audio
Overview / lobby	`/audio/rooms/Palace.mp3`
Inside a room	`/audio/rooms/{Style}.mp3`
Any capture active (`status === 'capturing'`, all source types)	Muted — prevents bleed into mic or tab stream
Recall / voice session active	Ducked to 10%
Idle	Normal volume

Session-End Summary

concept_count in capture_complete is derived from artifact_ids — only extractions where categorization is set (i.e. actually saved to Firestore). Failed extractions (e.g. embedding API error) are excluded so the count always matches what is visible in the palace.

Infrastructure (Terraform)

Resource	Type	Config
`rayan-backend`	Cloud Run v2	2 CPU / 2 GB / max 10 instances / session affinity
`(default)`	Firestore Native	`us-central1`
`rayan-media-{project}`	Cloud Storage	US multi-region / CORS enabled
`rayan-frontend-{project}`	Cloud Storage	Static website hosting
`artifact-embeddings`	Vertex AI Vector Search	768-dim / cosine / Tree-AH
`rayan-backend`	Service Account	Firestore user + Storage admin + Vertex AI user

All resources provisioned with: terraform apply -var="project_id=<PROJECT_ID>"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rayan — System Architecture

Component Overview

Data Flow — Capture Mode

Data Flow — Recall Mode

Ambient Audio Behaviour

Session-End Summary

Infrastructure (Terraform)

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Rayan — System Architecture

Component Overview

Data Flow — Capture Mode

Data Flow — Recall Mode

Ambient Audio Behaviour

Session-End Summary

Infrastructure (Terraform)