GitHub - staggeredsix/vss-dev: Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

NVIDIA AI Blueprint: Video Search and Summarization

Overview

This repository is what powers the build experience, showcasing video search and summarization agent with NVIDIA NIM microservices.

Insightful, accurate, and interactive video analytics AI agents enable a range of industries to make better decisions faster. These AI agents are given tasks through natural language and can perform complex operations like video summarization and visual question-answering, unlocking entirely new application possibilities. The NVIDIA AI Blueprint makes it easy to get started building and customizing video analytics AI agents for video search and summarization — all powered by generative AI, vision language models (VLMs) like Cosmos Nemotron VLMs, large language models (LLMs) like Llama Nemotron LLMs, NVIDIA NeMo Retriever, and NVIDIA NIM.

Use Case / Problem Description

The NVIDIA AI Blueprint for Video Search and Summarization addresses the challenge of efficiently analyzing and summarizing large volumes of video data. This can be used to create vision AI agents, that can be applied to a multitude of use cases such as monitoring smart spaces, warehouse automation, and SOP validation. This is important where quick and accurate video analysis can lead to better decision-making and enhanced operational efficiency.

Software Components

NIM microservices: Here are models used in this blueprint:
Ingestion Pipeline:

The process involves decoding video segments (chunks) generated by the stream handler, selecting frames, and using a vision-language model (VLM) along with a caption prompt to generate detailed captions for each chunk. A computer vision pipeline enhances video analysis by providing detailed metadata on objects. In parallel, the audio is extracted and a transcription is generated. These dense captions, along with audio transcripts and CV metadata are then indexed into vector and graph databases for use in the Context-Aware Retrieval-Augmented Generation workflow.
CA-RAG module:

The Context-Aware Retrieval-Augmented Generation (CA-RAG) module leverages both Vector RAG and Graph-RAG as primary sources for video understanding. This module is utilized in key features such as summarization, Q&A, and sending alerts. During the Q&A workflow, the CA-RAG module extracts relevant context from the vector database and graph database to enhance temporal reasoning, anomaly detection, multi-hop reasoning, and scalability. This approach offers deeper contextual understanding and efficient management of extensive video data. Additionally, the context manager effectively maintains its working context by efficiently using both short-term memory, such as chat history, and long-term memory resources like vector and graph databases, as needed.

Target Audience

This blueprint is designed for ease of setup with extensive configuration options, requiring technical expertise. It is intended for:

Video Analysts and IT Engineers: Professionals focused on analyzing video data and ensuring efficient processing and summarization. The blueprint offers 1-click deployment steps, easy-to-manage configurations, and plug-and-play models, making it accessible for early developers.
GenAI Developers / Machine Learning Engineers: Experts who need to customize the blueprint for specific use cases. This includes modifying the RAG pipeline for unique datasets and fine-tuning LLMs as needed. For advanced users, the blueprint provides detailed configuration options and custom deployment possibilities, enabling extensive customization and optimization.

Repository Structure Overview

deploy/: Contains scripts for docker compose and helm chart deployment, along with notebook for Brev launchable deployment.
src/: Source code for the video search and summarization agent.

Documentation

For detailed instructions and additional information about this blueprint, please refer to the official documentation.

Prerequisites

Obtain API Key

NVIDIA AI Enterprise developer licence required to local host NVIDIA NIM.
API catalog keys:
- NVIDIA API catalog or NGC (steps to generate key)

Hardware Requirements

The platform requirement can vary depending on the configuration and deployment topology used for VSS and dependencies like VLM, LLM, etc. For a list of validated GPU topologies and what configuration to use, see the supported platforms.

Deployment Type	VLM	LLM	Embedding (llama-3.2-nv-embedqa-1b-v2)	Reranker (llama-3.2-nv-rerankqa-1b-v2)	Minimum GPU Requirement
Local deployment (Default topology)	Local (VILA 1.5)	Local (Llama 3.1 70B)	Local	Local	8xH200, 8xH100, 8xA100 (80GB), 8xL40S
Local deployment (Reduced Compute)	Local (NVILA 15b)	Local (Llama 3.1 70B)	Local	Local	4xH200, 4xH100, 4xA100 (80GB), 6xL40S
Local deployment (Single GPU)	Local (NVILA 15b)	Local (Llama 3.1 8b low mem mode)	Local	Local	1xH200, 1xH100, 1xA100 (80GB)
Local VLM deployment	Local	Remote	Remote	Remote	1xH200, 1xH100, 2xA100 (80GB), 2xL40S
Complete remote deployment	Remote	Remote	Remote	Remote	Minimum 8GB VRAM GPU

Quickstart Guide

Launchable Deployment

Ideal for: Quickly getting started with your own videos without worrying about hardware and software requirements.

Follow the steps from the documentation and notebook in deploy directory to complete all pre-requisites and deploy the blueprint using Brev Launchable in an 8xL040s Crusoe instance.

deploy/1_Deploy_VSS_docker_Crusoe.ipynb: This notebook is tailored specifically for the Crusoe CSP which uses Ephemeral storage.

Docker Compose Deployment

Ideal for: Development phase where you need to run VSS locally, test different models, and experiment with various deployment configurations. This method offers greater flexibility for debugging each component.

For custom VSS deployments through Docker Compose, multiple samples are provided to show different combinations of remote and local model deployments. The /deploy/docker directory contains a README with all the details. Link to README

System Requirements

Ubuntu 22.04
NVIDIA driver 535.161.08 (Recommended minimum version)
CUDA 12.2+ (CUDA driver installed with NVIDIA driver)
NVIDIA Container Toolkit 1.13.5+
Docker 27.5.1+
Docker Compose 2.32.4

Helm Chart Deployment

Ideal for: Production deployments that need to integrate with other systems. Helm offers advantages such as easy upgrades, rollbacks, and management of complex deployments.

The /deploy/helm/ directory contains a nvidia-blueprint-vss-2.3.0.tgz file which can be used to spin up VSS. Refer to the documentation here for detailed instructions.

System Requirements

Ubuntu 22.04
NVIDIA driver 535.183.06 (Recommended minimum version). NVIDIA driver 570.86.15 (for H200)
CUDA 12.2+ (CUDA driver installed with NVIDIA driver)
Kubernetes v1.31.2
NVIDIA GPU Operator v23.9 (Recommended minimum version)
Helm v3.x

NVIDIA AI Workbench

To launch the demo quickly with NVIDIA AI Workbench, open this repository in Workbench. The provided .workbench/workbench.yaml specification installs the required dependencies. Gradio is configured as a managed application that can be started from the Applications section of the Workbench UI.

Local Setup with Ollama

Install ffmpeg, OpenCV system libraries, and PyTorch with CUDA 12.8 support:

# ffmpeg is required for audio and frame extraction
# libgl1 and libglib2.0-0 are required by the OpenCV Python package
sudo apt-get install ffmpeg libgl1 libglib2.0-0  # or use brew on macOS
# If this is not available or fails to run, the Python package
# ``imageio-ffmpeg`` installed in the next step provides a
# portable ffmpeg binary.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Install Python dependencies:
```
pip3 install -r requirements.txt
```

Install and start Ollama:

curl -fsSL https://ollama.com/install.sh | sh
ollama serve &

Or use Docker Compose:

# GPU-enabled compose file
docker-compose up -d

Or run the helper script:

./run_local.sh [-p]

This helper installs the required PyTorch build, starts Ollama on port 51234, pulls models, and launches the Gradio interface. Use -p to share the Gradio interface publicly.

Modular Compose Setup

For a more modular deployment where each model runs in its own container run:

docker compose -f docker-compose.modular.yml up

This compose file launches separate services for Ollama, an ASR server, a reranker, a speculative captioning service, and the Gradio frontend. The frontend communicates with these services using the ASR_URL, RERANKER_URL, and SPEC_URL environment variables.

Pull models:

# Large VLM
ollama pull llava:34b-v1.6
# Small draft model used for speculative decoding
ollama pull llava:7b-v1.6

Launch the Gradio interface (or use run_local.sh):
```
 python3 src/vss_engine/gradio_frontend.py
```
The interface now binds to 0.0.0.0 so it can be reached from other machines. Use --share to obtain a public Gradio link. ASR, image captioning, and reranking now run in their own containers. When the model response includes a timestamp, click it to jump to that time in the video.

The interface stores transcripts and frame captions in a per-video RAG database under data/db, using a JSON structure with frame, time, and caption fields so each frame can be referenced by timestamp.

Repeated questions do not re-run inference over the same video.

Troubleshooting

If the Gradio frontend fails with ImportError: libGL.so.1, the OpenCV Python package is missing its system dependencies. Install them with:

sudo apt-get install libgl1 libglib2.0-0

Then rebuild any Docker images or restart your environment.

Known CVEs

VSS Engine 2.3.0 Container has the following known CVEs:

CVE

Description

CVE-2024-8966

This impacts gradio <= 5.22.0 python package, This impacts the file upload functionality of Gradio UI where an attacker can cause Denial-of-Service (DoS) attack by appending a large number of characters to the end of a multipart boundary. This affects the Gradio UI of VSS.

CVE-2025-32434

This impacts the torch v2.51.0 python package. This impacts loading of saved model weights from a tar file using torch.load() API which can result in remote code execution in case of malicious weights. The default weights for the models used by VSS are in safetensors format and are not affected by this vulnerability since torch.load() is not used. However, users must ensure safety of the weights if using other formats.

VSS Engine 2.3.0 Source Code has the following known CVEs:

CVE	Description
CVE-2024-7246	This affects the gRPC python package. It's possible for a gRPC client communicating with a HTTP/2 proxy to poison the HPACK table between the proxy and the backend such that other clients see failed requests. By default, VSS does not use a HTTP/2 proxy.
CVE-2024-27444	This issue is reported for langchain-milvus 0.1.5 dependency on older langchain version 0.1.5. However, VSS explicitly uses langchain 0.3.3 and hence is not applicable.
CVE-2024-28088	This issue is reported for langchain-milvus 0.1.5 dependency on older langchain version 0.1.5. However, VSS explicitly uses langchain 0.3.3 and hence is not applicable.
CVE-2024-38459	This issue is reported for langchain-milvus 0.1.5 dependency on older langchain version 0.1.5. However, VSS explicitly uses langchain 0.3.3 and hence is not applicable.

License

The software and materials in this repository are governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products; except for models which are governed by the NVIDIA Community Model License.ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement for Llama-3.1-70b-instruct; Llama 3.2 Community License Agreement for NVIDIA Retrieval QA Llama 3.2 1B Embedding v2 and NVIDIA Retrieval QA Llama 3.2 1B Reranking v2, Apache, Version 2.0 for https://github.com/google-research/big_vision/blob/main/LICENSE and Apache License, Version 2.0 for https://github.com/01-ai/Yi/blob/main/LICENSE. Built with Llama.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.project		.project
.workbench		.workbench
deploy		deploy
src		src
.aiw.yml		.aiw.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.asr		Dockerfile.asr
Dockerfile.frontend		Dockerfile.frontend
Dockerfile.reranker		Dockerfile.reranker
Dockerfile.spec		Dockerfile.spec
Dockerfile.telemetry		Dockerfile.telemetry
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.modular.yml		docker-compose.modular.yml
docker-compose.yaml		docker-compose.yaml
docker-compose.yml		docker-compose.yml
requirements.asr.txt		requirements.asr.txt
requirements.txt		requirements.txt
run_local.sh		run_local.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA AI Blueprint: Video Search and Summarization

Table of Contents

Overview

Use Case / Problem Description

Software Components

Target Audience

Repository Structure Overview

Documentation

Prerequisites

Obtain API Key

Hardware Requirements

Quickstart Guide

Launchable Deployment

Docker Compose Deployment

System Requirements

Helm Chart Deployment

System Requirements

NVIDIA AI Workbench

Local Setup with Ollama

Modular Compose Setup

Troubleshooting

Known CVEs

License

About

Uh oh!

Releases

Packages

Languages

License

staggeredsix/vss-dev

Folders and files

Latest commit

History

Repository files navigation

NVIDIA AI Blueprint: Video Search and Summarization

Table of Contents

Overview

Use Case / Problem Description

Software Components

Target Audience

Repository Structure Overview

Documentation

Prerequisites

Obtain API Key

Hardware Requirements

Quickstart Guide

Launchable Deployment

Docker Compose Deployment

System Requirements

Helm Chart Deployment

System Requirements

NVIDIA AI Workbench

Local Setup with Ollama

Modular Compose Setup

Troubleshooting

Known CVEs

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages