This README provides instructions on setting up and running an object-based video summarization pipeline using the SAM2 video predictor and YOLO-v8 object detection. The script extracts frames from a video, performs object detection, annotates the frames, and generates a final summary video based on user-selected objects.
Ensure you have the following installed:
- Python 3.6+ (for Jupyter/Colab)
- Python 3.9+ (for Streamlit UI)
- Jupyter Notebook or Google Colab (preferred)
- VS Code (for Streamlit)
- CUDA-compatible GPU (Recommended: 100 GB GPU for optimal performance)
⚠️ macOS is not supported due to lack of NVIDIA GPU support required by SAM2.
Run the following commands:
pip install huggingface_hub
pip install ultralytics
pip install opencv-python Pillow ipywidgets
pip install sam2
pip install -q supervision[assets] jupyter_bbox_widgetUpdate the script with the paths to your video and image:
SOURCE_VIDEO = "/path/to/your/demo.mp4"
image_path = "/path/to/your/sample_pics.png"Set the following parameters to control frame extraction:
START_IDX = 0 # Starting frame index
END_IDX = 300 # Ending frame index
SCALE = 1.0 # Frame resize scale- Choose at least two objects to include in the summarized video.
- Once objects are selected, run the summarization script.
- Final annotated and summarized video will be saved in the root directory.
Ensure the following dependencies are installed:
pip install huggingface_hub
pip install ultralytics
pip install opencv-python Pillow ipywidgets
pip install sam2Run the Streamlit app with:
streamlit run app.py- Upload a video and image
- Select objects to summarize
- Choose the timestamp range
- View and download the final summarized video
- The final summarized video is saved in the project root.
- Intermediate outputs (frames, annotations) are saved optionally.
For questions, raise an issue or contact the project maintainer.