This is a minimal starter repo to use the Grounded Segment Anything Model (GroundedSAM) for image segmentation via text-based input.
(Note: this animation shows the SAM3 web interface, but the principle is the same)
1. Clone the repo.
Make sure you have git and git lfs installed on your machine (e.g. via git lfs install).
git clone https://github.com/mluerig/demo-grounded-sam2. Install mamba via miniforge, or conda (Miniconda/Anaconda).
3. Open a terminal at the repo root dir, and create the environment from environment.yml:
mamba env create -f environment.yml -n grounded-sam1
mamba activate grounded-sam14. Optional: install PyTorch with GPU (NVIDIA) support. If you have a compatible CUDA GPU and drivers, install the CUDA build from the official channels:
mamba install nvidia::cuda-toolkit==12.6
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 --force-reinstallIf you do not have a GPU, you can skip this step. Autodistill and Grounded SAM will run on CPU but will be MUCH slower.
5. Unzip the example data (data_raw\input_imgs\butterflies.zip), and run the notebook
- The provided
environment.ymlkeeps PyTorch out by default so you can choose CPU or GPU builds explicitly. - The model will likely produce some false detections/segmentations, which is expected.