| Metadata | Details |
|---|---|
| Version | 2.3 (Revised with Risk Mitigation) |
| Date | 2025-11-21 |
| Status | Ready for Development |
| Reviewer | Yuna Tseng |
- Human Connection: This AI must feel like a trusted friend who listens, not a therapist who lectures.
- Business Viability: We treat data as a strategic asset. We optimize for Unit Economics using intelligent model routing and prove value through User Sentiment Delta tracking.
- Technical Robustness: Security and cost-efficiency are built into the architecture from day one.
- Product Name: MindFlow - AI Voice Coach
- Purpose: MindFlow is a web application that helps users maintain emotional wellbeing through voice journaling combined with AI-powered psychological support.
- Core Value Proposition:
- Validation First: AI validates feelings before offering insights.
- Pattern Recognition (RAG): Discovers blind spots users can't see themselves by recalling past entries.
- Adaptive Personality: Switches between listening and coaching modes.
- Measurable Impact: Tracks emotional trajectory to prove wellbeing improvements.
- Target Users: Individuals seeking mental wellness support, personal growth, or structured self-reflection.
- User lands on
/login. - Supabase Auth handles Email/Password or Google OAuth.
- Redirect to
/dashboardupon success.
- Mode Selection: User selects Listening / Coaching / Smart Mode.
- Recording: User records voice entry.
- Transcription: Audio sent to
/api/transcribe(Whisper). - Text Review (CRITICAL FOR COST SAVING):
- User sees the transcribed text.
- User allows to EDIT text manually. (This avoids re-recording/re-transcribing costs if there are minor errors).
- Submission: User clicks "Save & Analyze".
- Processing: System calculates sentiment, retrieves memories, and generates response.
Technical Concept: How RAG Works Here
- Embedding: Converting user text into mathematical coordinates (vectors).
- Retrieval: Finding past entries that are "mathematically close" to current feelings, even if keywords differ (e.g., "tired" matches "exhausted").
Process Steps:
- Tone Detection (Fast/Cheap): System identifies primary emotion and energy level using
gpt-4o-mini. - Model Routing:
- Simple Validation → Routes to
gpt-4o-mini. - Complex Coaching → Routes to
gpt-4o.
- Simple Validation → Routes to
- RAG Retrieval: Fetches relevant past entries from
pgvector.- New Cost Control: Implements embedding caching/deduplication to prevent redundant calculations.
- Response Generation: AI generates response based on Mode + Tone + History.
- Data Logging: Sentiment score, cost of session, and tokens used are logged.
- Framework: Next.js 14 (App Router), TypeScript, React 18.
- Database & Auth: Supabase (PostgreSQL + GoTrue).
- Vector Store: Supabase pgvector.
- AI Models: OpenAI (Whisper, GPT-4o, GPT-4o-mini, text-embedding-3-small).
- State Management: Zustand.
- Shadcn/UI, Tailwind CSS, Lucide React.
⚠️ SECURITY WARNING:
- SUPABASE_SERVICE_ROLE_KEY: This key possesses full privileges to bypass Row Level Security (RLS). It must NEVER appear in frontend code (
NEXT_PUBLIC_...). It is strictly for use in Next.js API Routes or Server Actions.- OpenAI Credit Check: Ensure your OpenAI account has sufficient Credit Balance (Pre-paid credits) before development. Simply having a credit card linked may not be sufficient for API access depending on account tier.
Create a .env.local file with these exact keys:
# Supabase (Retrieve from Project Settings -> API)
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key # BACKEND ONLY
# OpenAI
OPENAI_API_KEY=sk-proj-...
# App Config
NEXT_PUBLIC_APP_URL=http://localhost:3000
⚠️ CRITICAL SETUP INSTRUCTION: The SQL commands below cannot be run via migration scripts reliably in all environments. You MUST manually copy-paste and run steps 4.1 and 4.2 in the Supabase SQL Editor dashboard before starting development to ensure thevectorextension and indexes are correctly activated.
-- Enable Vector extension for RAG
-- Note: Must be enabled manually in Dashboard if this fails
CREATE EXTENSION IF NOT EXISTS vector;Profiles
CREATE TABLE public.profiles (
id UUID REFERENCES auth.users(id) PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
full_name TEXT,
default_ai_mode TEXT DEFAULT 'smart',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
ALTER TABLE public.profiles ENABLE ROW LEVEL SECURITY;
-- Add RLS policies here...Entries (Updated for Data Strategy)
CREATE TABLE public.entries (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE NOT NULL,
-- Content
transcription TEXT NOT NULL,
audio_url TEXT,
-- AI Analysis & Metadata
ai_response TEXT,
ai_mode TEXT CHECK (ai_mode IN ('listening', 'coaching', 'smart')),
emotion_tags TEXT[],
detected_tone TEXT,
-- Data Strategy & BI Columns
sentiment_score FLOAT, -- 0.0 (Negative) to 1.0 (Positive)
energy_score FLOAT, -- 0.0 (Low) to 1.0 (High Energy)
tokens_used INT, -- Total tokens for this entry
cost_usd FLOAT, -- Estimated cost for this entry
-- RAG
referenced_entry_ids UUID[],
-- Search
transcription_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', transcription)) STORED,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Add RLS policies (uid = user_id)Embeddings
CREATE TABLE public.embeddings (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
entry_id UUID REFERENCES public.entries(id) ON DELETE CASCADE NOT NULL,
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE NOT NULL,
embedding vector(1536) NOT NULL, -- Matches text-embedding-3-small
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- CRITICAL PERFORMANCE INDEX
-- Must be created after some data exists for optimal performance,
-- but acceptable to create early for ivfflat.
CREATE INDEX embeddings_vector_idx ON public.embeddings USING ivfflat (embedding vector_cosine_ops);Usage Logs (For Unit Economics)
CREATE TABLE public.usage_logs (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
user_id UUID REFERENCES auth.users(id),
feature TEXT, -- 'transcription', 'tone_detect', 'chat_response'
model_used TEXT, -- 'whisper', 'gpt-4o', 'gpt-4o-mini'
input_tokens INT,
output_tokens INT,
estimated_cost FLOAT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);-- Function to find similar entries
CREATE OR REPLACE FUNCTION match_entries(
query_embedding vector(1536),
match_user_id UUID, -- SECURITY: API must pass auth.uid() here
match_threshold float DEFAULT 0.7,
match_count int DEFAULT 5,
exclude_entry_id UUID DEFAULT NULL
)
RETURNS TABLE (
entry_id UUID,
transcription TEXT,
emotion_tags TEXT[],
detected_tone TEXT,
created_at TIMESTAMP WITH TIME ZONE,
similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
e.id AS entry_id,
e.transcription,
e.emotion_tags,
e.detected_tone,
e.created_at,
1 - (emb.embedding <=> query_embedding) AS similarity
FROM embeddings emb
JOIN entries e ON emb.entry_id = e.id
WHERE emb.user_id = match_user_id -- DATA ISOLATION CHECK
AND (exclude_entry_id IS NULL OR e.id != exclude_entry_id)
AND 1 - (emb.embedding <=> query_embedding) > match_threshold
ORDER BY emb.embedding <=> query_embedding
LIMIT match_count;
END;
$$;- Input: FormData with audio file.
- Model: Whisper-1.
- Logic: Return raw text. Allow client to edit text before saving.
- Security: Verify
supabase.auth.getUser()first. - Cost Control: Check daily usage limit for the user before processing.
- Logic:
- Save transcription to DB.
- Embedding Cache Check: (Optional) Check if similar text exists to reuse embedding (hash check).
- Generate Embedding (
text-embedding-3-small). - Tone Detection: Call
gpt-4o-minito get tone, sentiment_score. - Routing Decision:
- IF
aiMode == 'listening'→ Usegpt-4o-mini. - IF
aiMode == 'coaching'→ Usegpt-4o. - IF
aiMode == 'smart':- IF
sentiment_score < 0.3(High Distress) →gpt-4o(Safety/Depth). - ELSE →
gpt-4o-mini.
- IF
- IF
- RAG Retrieval: Call
match_entriesRPC. - Generate Response: Call selected model with history context.
- Logging: Record costs to
usage_logs.
- Model:
gpt-4o-mini - System Prompt:
Analyze the user's journal entry. Return JSON: { "tone": "positive" | "negative" | "neutral" | "seeking_help", "emotionTags": ["tag1", "tag2"], "sentiment_score": 0.0 to 1.0, "energy_score": 0.0 to 1.0 }
- Context Injection: "User mentioned [Previous Entry Date]: [Summary]. Use this context naturally."
- IF Listening Mode (gpt-4o-mini): "Validate. Mirror emotions. Keep under 50 words. No advice."
- IF Coaching Mode (gpt-4o): "Validate FIRST. Reference patterns. Ask one gentle question."
- Metric: Sentiment Delta (Current Entry Score vs. 7-Day Moving Average).
- Visualization: "Mood Horizon" chart in Dashboard.
- Goal: Keep Cost Per Session (CpS) < $0.03.
- Strategy:
- Model Hierarchy: Use
gpt-4o-minifor >60% of interactions (Tone detect + Listening mode). Usegpt-4oonly when deep reasoning is required. - Embedding Optimization: Implement simple caching or deduplication for identical text entries.
- Daily Limits: Enforce a soft limit (e.g., 5 AI-analyzed entries per day) per user to prevent cost runaways during beta.
- Batch Processing: (Future Phase) Consider delaying non-critical Tone Detection for batch processing if real-time analysis isn't strictly required for the UI update.
- Model Hierarchy: Use
- Business: Average Margin per User > 70%.
- User: Positive Sentiment Delta after 14 days.
- Tech: API Errors < 1%.
End of PRD.