🪨 Pearl Jam Requiem

"This is... Requiem. What you are seeing is indeed the truth. But you will never arrive at the truth that is going to happen." — Giorno Giovanna

Pearl Jam Requiem is an AI-powered Stand that watches your mom's cooking videos and breaks them down into step-by-step recipes — automatically. You upload a video, and the AI pipeline hears every word (Whisper), sees every frame (LLaVA), and remembers every step (SQLite). The result is an interactive recipe player that loops each cooking step so you can follow along without rewinding.

Named after Tonio Trussardi's Stand Pearl Jam from JoJo's Bizarre Adventure: Diamond is Unbreakable — a Stand that channels culinary perfection. This is its Requiem evolution: it doesn't just cook, it understands cooking.

⚡ The Stand's Abilities

Ability	What It Does	Powered By
HEAR	Transcribes spoken instructions from video (Hindi → English)	Faster-Whisper (medium, int8)
SEE	Analyzes extracted video frames to describe cooking actions	LLaVA via Ollama
EXTRACT	Pulls key frames at the start of each spoken segment	FFmpeg
REMEMBER	Stores every recipe and step persistently	SQLite + SQLAlchemy
PLAY	Interactive step-by-step player with auto-looping video	React + TypeScript

🧬 Architecture

Pearl_Jam_Requiem/
├── backend/                    # FastAPI — The Stand's Brain
│   ├── app/
│   │   ├── main.py             # App entry, CORS, static mounts
│   │   ├── api/routes.py       # API endpoints (upload, list, detail)
│   │   ├── db/
│   │   │   ├── database.py     # SQLite connection (nusqa.db)
│   │   │   └── models.py       # Recipe & RecipeStep ORM models
│   │   ├── schemas/recipe.py   # Pydantic response schemas
│   │   └── services/
│   │       ├── audio.py        # Whisper transcription pipeline
│   │       ├── vision.py       # LLaVA frame analysis
│   │       └── video.py        # FFmpeg frame extraction
│   ├── media/
│   │   ├── uploads/            # Uploaded video files
│   │   └── temp/               # Extracted frame images
│   └── requirements.txt
│
└── frontend/                   # React + Vite — The Stand's Face
    └── src/
        ├── App.tsx             # Router (/ and /recipe/:id)
        ├── Home.tsx            # Recipe grid + upload
        ├── RecipePlayer.tsx    # Video player + step guide
        ├── index.css           # Tailwind + global styles
        └── main.tsx            # React entry point

🔥 The AI Pipeline — How It Works

When you upload a cooking video, this is what happens behind the scenes:

📹 Video Upload
 │
 ├─ 1. HEAR (Whisper)
 │     └─ Transcribes audio → segments with timestamps
 │        Model: faster-whisper (medium, int8 quantized for CPU)
 │        Language: Hindi → English translation
 │
 ├─ 2. For each segment:
 │     │
 │     ├── EXTRACT (FFmpeg)
 │     │    └─ Pulls a high-quality JPEG frame at segment start
 │     │
 │     ├── SEE (LLaVA via Ollama)
 │     │    └─ Describes the cooking action in the frame
 │     │       Uses Whisper transcript as context for accuracy
 │     │
 │     └── SAVE (SQLite)
 │          └─ Saves step incrementally (so you see progress live)
 │
 └─ ✅ Recipe fully processed — all steps ready to play

The entire pipeline runs as a background task — the upload returns immediately while the AI works. No waiting.

🗄️ Database Schema

┌─────────────────────────┐       ┌──────────────────────────────────┐
│        recipes           │       │         recipe_steps              │
├─────────────────────────┤       ├──────────────────────────────────┤
│ id          INT (PK)     │──┐    │ id                  INT (PK)     │
│ title       STRING       │  │    │ recipe_id           INT (FK) ◄───┘
│ video_filename STRING    │  │    │ step_number         INT          │
│ created_at  STRING       │  └──► │ start_time          FLOAT        │
└─────────────────────────┘       │ end_time            FLOAT        │
                                  │ instruction         TEXT          │
                                  │ visual_description  TEXT          │
                                  │ video_loop_url      STRING        │
                                  └──────────────────────────────────┘

🌐 API Endpoints

All routes are prefixed with /api.

Method	Endpoint	Params	Description
`GET`	`/`	—	Health check. Returns `{"stand_user": "Faraz"}`
`POST`	`/api/upload`	`title` (query), `file` (form)	Upload a video — triggers background AI pipeline
`GET`	`/api/recipes`	`skip`, `limit` (query)	List all recipes with their steps
`GET`	`/api/recipes/{id}`	`id` (path)	Get a single recipe with all processed steps

Response shape for a recipe:

{
  "id": 1,
  "title": "Mummy's Chicken Curry",
  "video_filename": "chicken_curry.mp4",
  "created_at": "2026-03-07T14:30:00",
  "steps": [
    {
      "id": 1,
      "step_number": 1,
      "start_time": 0.0,
      "end_time": 15.5,
      "instruction": "Add oil to the pan and heat it up",
      "visual_description": "A hand pouring oil into a heated wok",
      "video_loop_url": "/media/temp/frame_at_0.jpg"
    }
  ]
}

🖥️ Frontend Features

Home Page (`/`)

Recipe Grid — Cards showing title, date, and step count
Upload Button — Accepts video/*, sends file + auto-generated title
Empty State — Friendly message when no recipes exist yet
Error Banner — Shows if the backend is unreachable

Recipe Player (`/recipe/:id`)

Split Layout — Video on the left, step guide on the right (responsive)
Auto-Looping Video — Each step loops within its start_time → end_time range
Step Navigation — Click any step to seek the video instantly
Current Step Overlay — Shows the active instruction on the video
Play/Pause Control — Manual override button
AI Vision Notes — Each step shows what LLaVA "saw" in the frame
Processing State — "Still analyzing..." message if the AI pipeline hasn't finished

🚀 Setup

Prerequisites

Python 3.10+
Node.js 18+
FFmpeg installed and on PATH
Ollama installed (ollama.com)

1. Clone

git clone https://github.com/farazmirzax/pearl-jam-requiem.git
cd pearl-jam-requiem

2. Backend

cd backend
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS/Linux
pip install -r requirements.txt
uvicorn app.main:app --reload

Backend runs on http://127.0.0.1:8000

3. Frontend

cd frontend
npm install
npm run dev

Frontend runs on http://localhost:5173

4. Ollama (Vision AI)

ollama serve          # Start the Ollama server
ollama pull llava     # Download the LLaVA vision model (~4.7GB)

5. Use It

Open http://localhost:5173
Click Upload Video and select a cooking video
Watch the terminal — the AI pipeline logs every step as it processes
Once done, click the recipe card to open the interactive step-by-step player

⚙️ Configuration

Setting	Location	Default	Description
Whisper model	`services/audio.py`	`medium`	Model size (`tiny`, `base`, `small`, `medium`, `large`)
Compute type	`services/audio.py`	`int8`	Quantization (optimized for 16GB RAM / CPU)
Audio language	`services/audio.py`	`hi` (Hindi)	Source language for transcription
Task	`services/audio.py`	`translate`	`translate` (→ English) or `transcribe` (keep original)
Vision model	`services/vision.py`	`llava`	Ollama model for frame analysis
Database	`db/database.py`	`sqlite:///./nusqa.db`	SQLite database file
CORS origins	`app/main.py`	`localhost:5173`, `127.0.0.1:5173`	Allowed frontend origins

🛠️ Tech Stack

Backend

Package	Purpose
FastAPI	Web framework + background tasks
Uvicorn	ASGI server
SQLAlchemy	ORM for SQLite
Pydantic v2	Request/response validation
faster-whisper	Speech-to-text (CPU-optimized)
ollama (Python)	Client for local LLaVA model
ffmpeg-python	Video frame extraction
Pillow	Image processing

Frontend

Package	Purpose
React 19	UI library
TypeScript	Type safety
Vite	Build tool + dev server
Tailwind CSS 4	Utility-first styling
React Router 7	Client-side routing
Axios	HTTP client
Lucide React	Icon library

🧑‍🍳 Why "Pearl Jam Requiem"?

In JoJo's Bizarre Adventure: Diamond is Unbreakable, Tonio Trussardi is a chef whose Stand, Pearl Jam, infuses his cooking with healing power. Every dish he makes is perfect — tailored to the person eating it.

This project is Pearl Jam's Requiem evolution. It doesn't cook the food — it watches someone cook and breaks down the knowledge into something anyone can follow. It hears, it sees, it remembers.

Your mom's recipes, preserved by a Stand. Arrivederci to forgotten family dishes.

👤 Stand User

Faraz — @farazmirzax

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪨 Pearl Jam Requiem

⚡ The Stand's Abilities

🧬 Architecture

🔥 The AI Pipeline — How It Works

🗄️ Database Schema

🌐 API Endpoints

🖥️ Frontend Features

Home Page (`/`)

Recipe Player (`/recipe/:id`)

🚀 Setup

Prerequisites

1. Clone

2. Backend

3. Frontend

4. Ollama (Vision AI)

5. Use It

⚙️ Configuration

🛠️ Tech Stack

Backend

Frontend

🧑‍🍳 Why "Pearl Jam Requiem"?

👤 Stand User

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪨 Pearl Jam Requiem

⚡ The Stand's Abilities

🧬 Architecture

🔥 The AI Pipeline — How It Works

🗄️ Database Schema

🌐 API Endpoints

🖥️ Frontend Features

Home Page (/)

Recipe Player (/recipe/:id)

🚀 Setup

Prerequisites

1. Clone

2. Backend

3. Frontend

4. Ollama (Vision AI)

5. Use It

⚙️ Configuration

🛠️ Tech Stack

Backend

Frontend

🧑‍🍳 Why "Pearl Jam Requiem"?

👤 Stand User

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Home Page (`/`)

Recipe Player (`/recipe/:id`)

Packages