This project automatically generates chapters for audio or video content.
Each chapter includes:
- Time intervals (start and end timestamps)
- Chapter titles
- Chapter descriptions (short summaries or fine-grained captions)
Use an open-source Whisper model from Hugging Face to convert audio into text with timestamps.
Use Llama 3.1 8B instruct to generate chapter titles from ASR. Use langchain pydantic to parse llm response as json
{
"chapters": [
{
"title": "Creating a YouTube Short",
"start_time": 0.0,
"end_time": 12.84,
"description": "Creating a YouTube short from an existing video",
"details": [
"Go to YouTube",
"Click on the plus sign",
"Create a short",
"Choose the existing media in your phone"
]
},
{
"title": "Finalizing the YouTube Short",
"start_time": 13.16,
"end_time": 22.88,
"description": "Finalizing the YouTube short",
"details": [
"You're good to go",
"That's how you do that",
"That's pretty much it"
]
},
{
"title": "Conclusion",
"start_time": 22.96,
"end_time": 25.02,
"description": "Conclusion",
"details": [
"Thank you for watching",
"Goodbye"
]
}
]
}{
"chapters": [
{
"title": "Smell and Odor",
"start_time": 0.0,
"end_time": 10.0,
"description": "Discussing the smell and odor of food",
"details": [
"The stale smell of old beer lingers.",
"It takes heat to bring out the odor.",
"A cold dip restores health and zest."
]
},
{
"title": "Food Preferences",
"start_time": 10.0,
"end_time": 18.0,
"description": "Talking about favorite foods",
"details": [
"A salt pickle tastes fine with ham.",
"Tacos al pastor are my favorite.",
"A zestful food is the hot cross bun."
]
}
]
}-
done: add chapter time stamps
-
done: add detailed description of chapter
-
done: add parser functionality to chap gen module
-
add streamlit UI
-
add s3 storage option
-
huggingface spaces streamlit
-
add video input functionality
-
add timestamps for fine chapter description
-
add video q/a using chapter and video details and asr (study if this require RAG)