Skip to content

maheensaleh/VideoChapterGeneration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Audio / Video Chapter Generation

This project automatically generates chapters for audio or video content.
Each chapter includes:

  • Time intervals (start and end timestamps)
  • Chapter titles
  • Chapter descriptions (short summaries or fine-grained captions)

Pipeline Overview

1. Generate ASR text from audio

Use an open-source Whisper model from Hugging Face to convert audio into text with timestamps.

2. Generate chapters from ASR text

Use Llama 3.1 8B instruct to generate chapter titles from ASR. Use langchain pydantic to parse llm response as json

Sample Results

1. Input Audio 1

Sample audio

Output (JSON)

{
  "chapters": [
    {
      "title": "Creating a YouTube Short",
      "start_time": 0.0,
      "end_time": 12.84,
      "description": "Creating a YouTube short from an existing video",
      "details": [
        "Go to YouTube",
        "Click on the plus sign",
        "Create a short",
        "Choose the existing media in your phone"
      ]
    },
    {
      "title": "Finalizing the YouTube Short",
      "start_time": 13.16,
      "end_time": 22.88,
      "description": "Finalizing the YouTube short",
      "details": [
        "You're good to go",
        "That's how you do that",
        "That's pretty much it"
      ]
    },
    {
      "title": "Conclusion",
      "start_time": 22.96,
      "end_time": 25.02,
      "description": "Conclusion",
      "details": [
        "Thank you for watching",
        "Goodbye"
      ]
    }
  ]
}

2. Input Audio 2

Sample audio

Output (JSON)

{
  "chapters": [
    {
      "title": "Smell and Odor",
      "start_time": 0.0,
      "end_time": 10.0,
      "description": "Discussing the smell and odor of food",
      "details": [
        "The stale smell of old beer lingers.",
        "It takes heat to bring out the odor.",
        "A cold dip restores health and zest."
      ]
    },
    {
      "title": "Food Preferences",
      "start_time": 10.0,
      "end_time": 18.0,
      "description": "Talking about favorite foods",
      "details": [
        "A salt pickle tastes fine with ham.",
        "Tacos al pastor are my favorite.",
        "A zestful food is the hot cross bun."
      ]
    }
  ]
}

Todo/Future work

  • done: add chapter time stamps

  • done: add detailed description of chapter

  • done: add parser functionality to chap gen module

  • add streamlit UI

  • add s3 storage option

  • huggingface spaces streamlit

  • add video input functionality

  • add timestamps for fine chapter description

  • add video q/a using chapter and video details and asr (study if this require RAG)

About

generate chapters for videos or audios

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors