Skip to content

farazmirzax/pearl-jam-requiem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🪨 Pearl Jam Requiem

"This is... Requiem. What you are seeing is indeed the truth. But you will never arrive at the truth that is going to happen." — Giorno Giovanna

Pearl Jam Requiem is an AI-powered Stand that watches your mom's cooking videos and breaks them down into step-by-step recipes — automatically. You upload a video, and the AI pipeline hears every word (Whisper), sees every frame (LLaVA), and remembers every step (SQLite). The result is an interactive recipe player that loops each cooking step so you can follow along without rewinding.

Named after Tonio Trussardi's Stand Pearl Jam from JoJo's Bizarre Adventure: Diamond is Unbreakable — a Stand that channels culinary perfection. This is its Requiem evolution: it doesn't just cook, it understands cooking.


⚡ The Stand's Abilities

Ability What It Does Powered By
HEAR Transcribes spoken instructions from video (Hindi → English) Faster-Whisper (medium, int8)
SEE Analyzes extracted video frames to describe cooking actions LLaVA via Ollama
EXTRACT Pulls key frames at the start of each spoken segment FFmpeg
REMEMBER Stores every recipe and step persistently SQLite + SQLAlchemy
PLAY Interactive step-by-step player with auto-looping video React + TypeScript

🧬 Architecture

Pearl_Jam_Requiem/
├── backend/                    # FastAPI — The Stand's Brain
│   ├── app/
│   │   ├── main.py             # App entry, CORS, static mounts
│   │   ├── api/routes.py       # API endpoints (upload, list, detail)
│   │   ├── db/
│   │   │   ├── database.py     # SQLite connection (nusqa.db)
│   │   │   └── models.py       # Recipe & RecipeStep ORM models
│   │   ├── schemas/recipe.py   # Pydantic response schemas
│   │   └── services/
│   │       ├── audio.py        # Whisper transcription pipeline
│   │       ├── vision.py       # LLaVA frame analysis
│   │       └── video.py        # FFmpeg frame extraction
│   ├── media/
│   │   ├── uploads/            # Uploaded video files
│   │   └── temp/               # Extracted frame images
│   └── requirements.txt
│
└── frontend/                   # React + Vite — The Stand's Face
    └── src/
        ├── App.tsx             # Router (/ and /recipe/:id)
        ├── Home.tsx            # Recipe grid + upload
        ├── RecipePlayer.tsx    # Video player + step guide
        ├── index.css           # Tailwind + global styles
        └── main.tsx            # React entry point

🔥 The AI Pipeline — How It Works

When you upload a cooking video, this is what happens behind the scenes:

📹 Video Upload
 │
 ├─ 1. HEAR (Whisper)
 │     └─ Transcribes audio → segments with timestamps
 │        Model: faster-whisper (medium, int8 quantized for CPU)
 │        Language: Hindi → English translation
 │
 ├─ 2. For each segment:
 │     │
 │     ├── EXTRACT (FFmpeg)
 │     │    └─ Pulls a high-quality JPEG frame at segment start
 │     │
 │     ├── SEE (LLaVA via Ollama)
 │     │    └─ Describes the cooking action in the frame
 │     │       Uses Whisper transcript as context for accuracy
 │     │
 │     └── SAVE (SQLite)
 │          └─ Saves step incrementally (so you see progress live)
 │
 └─ ✅ Recipe fully processed — all steps ready to play

The entire pipeline runs as a background task — the upload returns immediately while the AI works. No waiting.


🗄️ Database Schema

┌─────────────────────────┐       ┌──────────────────────────────────┐
│        recipes           │       │         recipe_steps              │
├─────────────────────────┤       ├──────────────────────────────────┤
│ id          INT (PK)     │──┐    │ id                  INT (PK)     │
│ title       STRING       │  │    │ recipe_id           INT (FK) ◄───┘
│ video_filename STRING    │  │    │ step_number         INT          │
│ created_at  STRING       │  └──► │ start_time          FLOAT        │
└─────────────────────────┘       │ end_time            FLOAT        │
                                  │ instruction         TEXT          │
                                  │ visual_description  TEXT          │
                                  │ video_loop_url      STRING        │
                                  └──────────────────────────────────┘

🌐 API Endpoints

All routes are prefixed with /api.

Method Endpoint Params Description
GET / Health check. Returns {"stand_user": "Faraz"}
POST /api/upload title (query), file (form) Upload a video — triggers background AI pipeline
GET /api/recipes skip, limit (query) List all recipes with their steps
GET /api/recipes/{id} id (path) Get a single recipe with all processed steps

Response shape for a recipe:

{
  "id": 1,
  "title": "Mummy's Chicken Curry",
  "video_filename": "chicken_curry.mp4",
  "created_at": "2026-03-07T14:30:00",
  "steps": [
    {
      "id": 1,
      "step_number": 1,
      "start_time": 0.0,
      "end_time": 15.5,
      "instruction": "Add oil to the pan and heat it up",
      "visual_description": "A hand pouring oil into a heated wok",
      "video_loop_url": "/media/temp/frame_at_0.jpg"
    }
  ]
}

🖥️ Frontend Features

Home Page (/)

  • Recipe Grid — Cards showing title, date, and step count
  • Upload Button — Accepts video/*, sends file + auto-generated title
  • Empty State — Friendly message when no recipes exist yet
  • Error Banner — Shows if the backend is unreachable

Recipe Player (/recipe/:id)

  • Split Layout — Video on the left, step guide on the right (responsive)
  • Auto-Looping Video — Each step loops within its start_timeend_time range
  • Step Navigation — Click any step to seek the video instantly
  • Current Step Overlay — Shows the active instruction on the video
  • Play/Pause Control — Manual override button
  • AI Vision Notes — Each step shows what LLaVA "saw" in the frame
  • Processing State — "Still analyzing..." message if the AI pipeline hasn't finished

🚀 Setup

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • FFmpeg installed and on PATH
  • Ollama installed (ollama.com)

1. Clone

git clone https://github.com/farazmirzax/pearl-jam-requiem.git
cd pearl-jam-requiem

2. Backend

cd backend
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS/Linux
pip install -r requirements.txt
uvicorn app.main:app --reload

Backend runs on http://127.0.0.1:8000

3. Frontend

cd frontend
npm install
npm run dev

Frontend runs on http://localhost:5173

4. Ollama (Vision AI)

ollama serve          # Start the Ollama server
ollama pull llava     # Download the LLaVA vision model (~4.7GB)

5. Use It

  1. Open http://localhost:5173
  2. Click Upload Video and select a cooking video
  3. Watch the terminal — the AI pipeline logs every step as it processes
  4. Once done, click the recipe card to open the interactive step-by-step player

⚙️ Configuration

Setting Location Default Description
Whisper model services/audio.py medium Model size (tiny, base, small, medium, large)
Compute type services/audio.py int8 Quantization (optimized for 16GB RAM / CPU)
Audio language services/audio.py hi (Hindi) Source language for transcription
Task services/audio.py translate translate (→ English) or transcribe (keep original)
Vision model services/vision.py llava Ollama model for frame analysis
Database db/database.py sqlite:///./nusqa.db SQLite database file
CORS origins app/main.py localhost:5173, 127.0.0.1:5173 Allowed frontend origins

🛠️ Tech Stack

Backend

Package Purpose
FastAPI Web framework + background tasks
Uvicorn ASGI server
SQLAlchemy ORM for SQLite
Pydantic v2 Request/response validation
faster-whisper Speech-to-text (CPU-optimized)
ollama (Python) Client for local LLaVA model
ffmpeg-python Video frame extraction
Pillow Image processing

Frontend

Package Purpose
React 19 UI library
TypeScript Type safety
Vite Build tool + dev server
Tailwind CSS 4 Utility-first styling
React Router 7 Client-side routing
Axios HTTP client
Lucide React Icon library

🧑‍🍳 Why "Pearl Jam Requiem"?

In JoJo's Bizarre Adventure: Diamond is Unbreakable, Tonio Trussardi is a chef whose Stand, Pearl Jam, infuses his cooking with healing power. Every dish he makes is perfect — tailored to the person eating it.

This project is Pearl Jam's Requiem evolution. It doesn't cook the food — it watches someone cook and breaks down the knowledge into something anyone can follow. It hears, it sees, it remembers.

Your mom's recipes, preserved by a Stand. Arrivederci to forgotten family dishes.


👤 Stand User

Faraz@farazmirzax

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors