🎬 Audio / Video Chapter Generation

This project automatically generates chapters for audio or video content.
Each chapter includes:

Time intervals (start and end timestamps)
Chapter titles
Chapter descriptions (short summaries or fine-grained captions)

Pipeline Overview

1. Generate ASR text from audio

Use an open-source Whisper model from Hugging Face to convert audio into text with timestamps.

2. Generate chapters from ASR text

Use Llama 3.1 8B instruct to generate chapter titles from ASR. Use langchain pydantic to parse llm response as json

Sample Results

1. Input Audio 1

Sample audio

Output (JSON)

{
  "chapters": [
    {
      "title": "Creating a YouTube Short",
      "start_time": 0.0,
      "end_time": 12.84,
      "description": "Creating a YouTube short from an existing video",
      "details": [
        "Go to YouTube",
        "Click on the plus sign",
        "Create a short",
        "Choose the existing media in your phone"
      ]
    },
    {
      "title": "Finalizing the YouTube Short",
      "start_time": 13.16,
      "end_time": 22.88,
      "description": "Finalizing the YouTube short",
      "details": [
        "You're good to go",
        "That's how you do that",
        "That's pretty much it"
      ]
    },
    {
      "title": "Conclusion",
      "start_time": 22.96,
      "end_time": 25.02,
      "description": "Conclusion",
      "details": [
        "Thank you for watching",
        "Goodbye"
      ]
    }
  ]
}

2. Input Audio 2

Sample audio

Output (JSON)

{
  "chapters": [
    {
      "title": "Smell and Odor",
      "start_time": 0.0,
      "end_time": 10.0,
      "description": "Discussing the smell and odor of food",
      "details": [
        "The stale smell of old beer lingers.",
        "It takes heat to bring out the odor.",
        "A cold dip restores health and zest."
      ]
    },
    {
      "title": "Food Preferences",
      "start_time": 10.0,
      "end_time": 18.0,
      "description": "Talking about favorite foods",
      "details": [
        "A salt pickle tastes fine with ham.",
        "Tacos al pastor are my favorite.",
        "A zestful food is the hot cross bun."
      ]
    }
  ]
}

Todo/Future work

done: add chapter time stamps
done: add detailed description of chapter
done: add parser functionality to chap gen module
add streamlit UI
add s3 storage option
huggingface spaces streamlit
add video input functionality
add timestamps for fine chapter description
add video q/a using chapter and video details and asr (study if this require RAG)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Audio / Video Chapter Generation

Pipeline Overview

1. Generate ASR text from audio

2. Generate chapters from ASR text

Sample Results

1. Input Audio 1

Output (JSON)

2. Input Audio 2

Output (JSON)

Todo/Future work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 Audio / Video Chapter Generation

Pipeline Overview

1. Generate ASR text from audio

2. Generate chapters from ASR text

Sample Results

1. Input Audio 1

Output (JSON)

2. Input Audio 2

Output (JSON)

Todo/Future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages