📚 Educational Video Transcript Summarization with Transformers

A project by Harshil Pradhan

📖 Overview

This project focuses on summarizing video lecture transcriptions using state-of-the-art deep learning models. The objective is to transform long, often noisy transcripts into concise, structured summaries. We evaluate and compare the performance of three leading transformer-based models—T5, Pegasus, and BART—and introduce an Ensemble model to enhance the quality and robustness of summaries.

The VT-SSum dataset, purpose-built for video transcript segmentation and summarization, is used to fine-tune these models. This effort makes educational content more consumable, searchable, and accessible.

🛠️ Methodology

📦 Dataset Preparation

Dataset: VT-SSum
- 9,616 videos
- 125,000 transcript-summary pairs
Transcripts are cleaned, segmented, and tokenized into input-output pairs for summarization.

🤖 Model Selection

T5: Text-to-Text Transfer Transformer by Google
Pegasus: Optimized summarization model by Google
BART: Bidirectional Auto-Regressive Transformer by Facebook AI
Ensemble Model (Proposed): Combines predictions from all three models via averaging or voting techniques.

⚙️ Training & Evaluation

Models were fine-tuned using GPU acceleration.
Dataset was split into 80% training and 20% validation.
Early stopping was applied to avoid overfitting.
ROUGE metrics were used for performance evaluation.

🔁 Methodology Flowchart

🏗️ Model Architectures

🔷 T5 (Text-to-Text Transfer Transformer)

Developed by Google.
Converts all tasks into a unified text-to-text form.
Uses an encoder-decoder architecture that performs exceptionally well in summarization tasks.

🔷 Pegasus

Also by Google, designed specifically for abstractive summarization.
Pre-training involves masking and predicting entire sentences, enabling excellent understanding of salient content.
Delivers high-quality, coherent summaries.

🔷 BART (Bidirectional and Auto-Regressive Transformers)

Developed by Facebook AI.
Trained as a denoising autoencoder: corrupts inputs and learns to recover the original.
Blends BERT-style encoding with GPT-style decoding for strong generative performance.

🔷 Ensemble Model (Proposed)

Combines the predictions of T5, Pegasus, and BART using voting or averaging strategies.
Aims to produce more consistent and accurate summaries by integrating the strengths of each individual model.
Trained on a single P100 GPU, achieving improved ROUGE scores.

🖼️ Architecture Images

💡 BART Architecture

💡 Pegasus Architecture

💡 T5 Architecture

📊 Results & Performance

Model	ROUGE-1	ROUGE-2	ROUGE-L	Execution Time (hrs)	GPU Used
T5	0.90	0.83	0.88	10–12	NVIDIA T4 ×2
Pegasus	0.87	0.81	0.85	8–10	NVIDIA T4 ×2
BART	0.86	0.80	0.84	6–8	NVIDIA T4 ×2
🌟 Ensemble	0.91	0.84	0.89	12	NVIDIA P100

T5 achieved the best performance among the standalone models.
The Ensemble Model produced the highest overall ROUGE scores and more stable results across different input types.
Ensemble utilized a single NVIDIA P100 GPU and completed in approximately 12 hours.

🚀 Future Work

🔍 Explore additional transformer models such as LongT5, LED, or GPT variants.
🎯 Perform automated hyperparameter tuning using frameworks like Optuna or Ray.
🧠 Apply data augmentation techniques (paraphrasing, back-translation).
📹 Integrate multimodal data such as video frames or audio transcripts.
🌐 Deploy as a web service or plugin for use in e-learning platforms (e.g., Moodle, Coursera).

📜 References

T. Lv, L. Cui, M. Vasilijevic, and F. Wei, "VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization," arXiv, 2021.
A. Vaswani et al., "Attention is All You Need," NIPS, 2017.
M. Lewis et al., "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation," ACL, 2020.
J. Zhang et al., "PEGASUS: Pre-training with Extracted Gap-Sentences," ICML, 2020.

🤝 Contributions

If you'd like to contribute, open issues or pull requests are welcome!

🧑‍💻 Developed by Harshil Pradhan

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
243IT002_Harshil_Pradhan_DL_Report.zip		243IT002_Harshil_Pradhan_DL_Report.zip
Bart Final.py		Bart Final.py
Datasets		Datasets
Flowchart.png		Flowchart.png
Pegasus Arch.png		Pegasus Arch.png
Pegasus Final.py		Pegasus Final.py
README.md		README.md
Rouge test.py		Rouge test.py
T5 Arch .png		T5 Arch .png
T5 base Final.ipynb		T5 base Final.ipynb
The-BART-model-architecture.png		The-BART-model-architecture.png
Transcript_Summarizaation_Reportt.pdf		Transcript_Summarizaation_Reportt.pdf
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Educational Video Transcript Summarization with Transformers

📖 Overview

🛠️ Methodology

📦 Dataset Preparation

🤖 Model Selection

⚙️ Training & Evaluation

🔁 Methodology Flowchart

🏗️ Model Architectures

🔷 T5 (Text-to-Text Transfer Transformer)

🔷 Pegasus

🔷 BART (Bidirectional and Auto-Regressive Transformers)

🔷 Ensemble Model (Proposed)

🖼️ Architecture Images

💡 BART Architecture

💡 Pegasus Architecture

💡 T5 Architecture

📊 Results & Performance

🚀 Future Work

📜 References

🤝 Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 Educational Video Transcript Summarization with Transformers

📖 Overview

🛠️ Methodology

📦 Dataset Preparation

🤖 Model Selection

⚙️ Training & Evaluation

🔁 Methodology Flowchart

🏗️ Model Architectures

🔷 T5 (Text-to-Text Transfer Transformer)

🔷 Pegasus

🔷 BART (Bidirectional and Auto-Regressive Transformers)

🔷 Ensemble Model (Proposed)

🖼️ Architecture Images

💡 BART Architecture

💡 Pegasus Architecture

💡 T5 Architecture

📊 Results & Performance

🚀 Future Work

📜 References

🤝 Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages