VLM Grasping

This is a demo combining Google Gemini and Segment Anything Model 2 (SAM 2) for open-vocabulary manipulation tasks.

Laptop/Workstation Setup

This demo has been tested on

Ubuntu 22.04 + Pyhton 3.10 + RTX 4060 Laptop + CUDA 12.1
Ubuntu 24.04 + Pyhton 3.12 + RTX 4060 Ti + CUDA 12.1

Installation

Create a Python virtual environment.

python -m venv ~/venvs/vlm

Install Segment Anything Model 2 (SAM 2)

cd ~ # Install in home directory by default.
git clone https://github.com/facebookresearch/sam2.git
cd sam2

# Make sure installing SAM 2 in the Python virtual environment.
source ~/venvs/vlm/bin/activate
pip install -e .

# Download checkpoints
cd checkpoints
./download_ckpts.sh

Install this package

# Make sure installing dependencies in the Python virtual environment.
source ~/venvs/vlm/bin/activate

# Install Dependencies
cd <path-to-this-project>
pip install -r requirements.txt

Demo

Before running the demo, setup google_gemini_api_key and sam2_directory in config/config.yaml:

google_gemini_api_key: # Use your own API key
sam2_directory: # For example: /home/zhengxiao-han/sam2

To run the demo, simply run demo.py

# Make sure using the Python virtual environment.
source ~/venvs/vlm/bin/activate
python demo.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
docs		docs
foundation_models		foundation_models
utils		utils
vlm_interfaces		vlm_interfaces
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.jpeg		demo.jpeg
demo.py		demo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLM Grasping

Laptop/Workstation Setup

Installation

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mvyp/vlm_grasping

Folders and files

Latest commit

History

Repository files navigation

VLM Grasping

Laptop/Workstation Setup

Installation

Demo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages