Skip to content

Latest commit

 

History

History
199 lines (186 loc) · 13.7 KB

File metadata and controls

199 lines (186 loc) · 13.7 KB

Rayan — System Architecture

Component Overview

┌──────────────────────────────────────────────────────────────────────┐
│                          User's Browser                              │
│                   React 18 + Three.js + @react-three/fiber           │
│                        (Firebase Hosting)                            │
│                                                                      │
│  ┌─────────────┐   ┌──────────────────┐   ┌──────────────────────┐  │
│  │  3D Memory  │   │  Capture Panel   │   │   Voice Panel        │  │
│  │  Palace     │   │  (mic + screen   │   │   (Recall mode)      │  │
│  │  Three.js   │   │   stream)        │   │   audio playback     │  │
│  └──────┬──────┘   └────────┬─────────┘   └──────────┬───────────┘  │
│         │                  │                          │              │
│         └──────────────────┴──────────────────────────┘              │
│                            │  WebSocket /ws/{userId}                 │
└────────────────────────────┼─────────────────────────────────────────┘
                             │
┌────────────────────────────▼─────────────────────────────────────────┐
│                     Cloud Run — FastAPI + uvicorn                    │
│                        (session affinity)                            │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  WebSocket Handler  /ws/{userId}                             │    │
│  │  Routes: audio_chunk, video_frame, capture_start/stop,       │    │
│  │          recall_start/stop, screenshot_response              │    │
│  └────────┬───────────────────────────────┬─────────────────────┘    │
│           │                               │                          │
│  ┌────────▼─────────────────┐   ┌─────────▼──────────────────────┐  │
│  │  CaptureAgent            │   │  RecallAgent                   │  │
│  │                          │   │                                │  │
│  │  Model:                  │   │  Model:                        │  │
│  │  gemini-live-2.5-flash-  │   │  gemini-live-2.5-flash-        │  │
│  │  native-audio            │   │  native-audio                  │  │
│  │                          │   │                                │  │
│  │  enable_affective_       │   │  enable_affective_             │  │
│  │  dialog=True             │   │  dialog=True                   │  │
│  │                          │   │                                │  │
│  │  Tools:                  │   │  Tools:                        │  │
│  │  • capture_concept       │   │  • navigate_to_room            │  │
│  │  • create_artifact       │   │  • navigate_horizontal         │  │
│  │  • create_room           │   │  • navigate_to_map_view        │  │
│  │  • take_screenshot       │   │  • highlight_artifact          │  │
│  │  • edit_artifact         │   │  • create_artifact             │  │
│  │  • delete_artifact       │   │  • edit_artifact               │  │
│  │  • web_search            │   │  • delete_artifact             │  │
│  │  • navigate_to_room      │   │  • delete_room                 │  │
│  │  • end_session           │   │  • synthesize_room             │  │
│  │                          │   │  • web_search                  │  │
│  │  Dedup: cosine ≥0.90     │   │  • end_session                 │  │
│  │  (gemini-embedding-2-    │   │                                │  │
│  │   preview, per-session)  │   │                                │  │
│  └────────┬─────────────────┘   └─────────┬──────────────────────┘  │
│           │                               │                          │
│  ┌────────▼───────────────────────────────▼──────────────────────┐   │
│  │  Memory Architect  (gemini-2.5-flash)                         │   │
│  │  • Categorizes concept into existing room or suggests new one  │   │
│  │  • Assigns artifact type and visual                            │   │
│  │  • Generates embedding via Vertex AI text-embedding-005        │   │
│  │  • Writes artifact + embedding to Firestore                    │   │
│  └────────────────────────────────┬───────────────────────────────┘  │
│                                   │                                  │
│  ┌────────────────────────────────▼───────────────────────────────┐  │
│  │  Semantic Search  (recall grounding)                           │  │
│  │  • Embeds user query via Vertex AI text-embedding-005          │  │
│  │  • Cosine similarity scan across all stored embeddings         │  │
│  │  • Top-8 results injected into RecallAgent system prompt       │  │
│  │  • Re-runs on every room navigation and artifact highlight      │  │
│  └────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────┘
                             │
┌────────────────────────────▼─────────────────────────────────────────┐
│                         Google Cloud                                 │
│                                                                      │
│  ┌──────────────────────┐   ┌────────────────────────────────────┐   │
│  │  Cloud Firestore     │   │  Vertex AI                         │   │
│  │                      │   │                                    │   │
│  │  users/              │   │  text-embedding-005                │   │
│  │   {userId}/          │   │  768-dimensional embeddings        │   │
│  │    rooms/            │   │  Used for:                         │   │
│  │     {roomId}/        │   │  • Artifact storage (capture)      │   │
│  │      artifacts/      │   │  • Semantic search (recall)        │   │
│  │       {id}           │   │  • Dedup detection (capture)       │   │
│  │       .embedding[]   │   │                                    │   │
│  │       .summary       │   │  Vector Search index               │   │
│  │       .fullContent   │   │  768-dim, cosine, Tree-AH          │   │
│  │       .type          │   │  (Terraform-provisioned)           │   │
│  └──────────────────────┘   └────────────────────────────────────┘   │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  Cloud Storage  (rayan-media-{project})                      │    │
│  │  • JPEG screenshots captured during Capture sessions         │    │
│  │  • AI-generated mind map images (synthesize_room)            │    │
│  │  • Public URLs stored as sourceMediaUrl on artifacts         │    │
│  └──────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  Firebase Hosting  (frontend)                                │    │
│  │  Firebase Auth     (user identity)                           │    │
│  └──────────────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────────────┘

Data Flow — Capture Mode

Mic + Screen
     │
     ▼
CaptureAgent (Gemini Live)
     │
     ├── _user_has_spoken gate: all tools blocked until
     │   user speaks at least once (prevents eager tool
     │   calls during the opening greeting)
     │
     │ autonomous concept detection
     │ OR user says "save this"
     ▼
capture_concept / create_artifact tool call
     │
     ├── rate limit (selective=60s / balanced=30s / thorough=12s)
     ├── confidence ≥ 0.7
     ├── within-session dedup: cosine ≥ 0.90 → merge instead
     │
     ▼
Memory Architect (gemini-2.5-flash)
  • Selects or creates a room
  • Assigns artifact type + visual
     │
     ▼
text-embedding-005 (Vertex AI)
  • Generates 768-dim embedding
     │
     ├──► Firestore  (artifact + embedding stored)
     │
     └──► WebSocket → Browser
              capture_ack (buffered until Rayan speaks —
                           badge appears after spoken message)
              palace_update → artifact appears in 3D palace live

Data Flow — Recall Mode

User speaks
     │
     ▼
RecallAgent (Gemini Live)
  enable_affective_dialog=True
     │
     ▼
On session start / room nav / artifact highlight:
  update_context() → semantic_search()
     │
     ▼
text-embedding-005 (Vertex AI)
  Embed current context / artifact summary
     │
     ▼
Cosine similarity vs all stored embeddings (Firestore)
  Top-8 most relevant memories selected
     │
     ▼
send_client_content → injected into live conversation
  RecallAgent answers ONLY from these grounded memories
     │
     ▼
Audio response → WebSocket → Browser
  + optional tool calls (navigate, highlight, synthesize)

Ambient Audio Behaviour

State Audio
Overview / lobby /audio/rooms/Palace.mp3
Inside a room /audio/rooms/{Style}.mp3
Any capture active (status === 'capturing', all source types) Muted — prevents bleed into mic or tab stream
Recall / voice session active Ducked to 10%
Idle Normal volume

Session-End Summary

concept_count in capture_complete is derived from artifact_ids — only extractions where categorization is set (i.e. actually saved to Firestore). Failed extractions (e.g. embedding API error) are excluded so the count always matches what is visible in the palace.

Infrastructure (Terraform)

Resource Type Config
rayan-backend Cloud Run v2 2 CPU / 2 GB / max 10 instances / session affinity
(default) Firestore Native us-central1
rayan-media-{project} Cloud Storage US multi-region / CORS enabled
rayan-frontend-{project} Cloud Storage Static website hosting
artifact-embeddings Vertex AI Vector Search 768-dim / cosine / Tree-AH
rayan-backend Service Account Firestore user + Storage admin + Vertex AI user

All resources provisioned with: terraform apply -var="project_id=<PROJECT_ID>"