VLLM

This project implements a Visual Language Model (VLM)-powered multi-modal search engine using OpenAI’s CLIP and the COCO dataset. It enables users to retrieve semantically relevant images based on either a text prompt (e.g., “A Panda”) or an example image. The system uses CLIP to embed both images and text into a shared latent space and leverages FAISS for efficient similarity-based indexing and retrieval. Results are presented through an intuitive Gradio web interface, offering an interactive and scalable solution for content-based image search.

Features

Text-based and image-based image search
Local dataset (COCO) visual search using FAISS
Web search using Unsplash API
Semantic similarity using CLIP & Evaluation metrics
Responsive, clean Streamlit interface

Technology Stack

User Interface (Streamlit)

Application with Accuracy:

Demo

Demo Video

Evaluation Metrics

Precision by K calls – Relevance among top-K results
Recall by K calls – Coverage of relevant images in top-K
mAP – Mean average precision over the returned set

where, K - represent the no of images (K = 6 default)

Installation

1. Clone the repository

git clone https://github.com/your-repo/clip-visual-search.git
cd clip-visual-search

2. Install dependencies

Make sure you have Python 3.10+ installed, then run:

pip install -r requirements.txt

3. Prepare local dataset

Download the COCO dataset
Download the following files:
- train2017 images (around 18GB).
- captions_train2017.json (annotations for the train set).

4. Generate FAISS, captions

faiss_clip_index.idx – FAISS index for CLIP image embeddings
captions.npy – NumPy array of image captions

6. Set your Unsplash API key

Go to Unsplash Developers, create an app, and get your Access Key.

Replace the following line in vllm.py with your key:

UNSPLASH_ACCESS_KEY = "YOUR_UNSPLASH_ACCESS_KEY"

You're now ready to launch the app with:

streamlit run vllm.py

📚 References

Reference Number	Title / Source	Link
[1]	Learning Transferable Visual Models From Natural Language Supervision	https://arxiv.org/pdf/2103.00020
[2]	FAISS: Facebook AI Similarity Search	https://arxiv.org/pdf/2401.08281
[3]	Streamlit Documentation	https://docs.streamlit.io
[4]	Unsplash API	https://unsplash.com/developers
[5]	scikit-learn: Precision, Recall, mAP Metrics	https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
[6]	COCO: Common Objects in Context Dataset	https://cocodataset.org/#home

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
COCO_DATASET_SAMPLE		COCO_DATASET_SAMPLE
Dataset.py		Dataset.py
FAISS_TRANSFORMER.py		FAISS_TRANSFORMER.py
LICENSE		LICENSE
README.md		README.md
Requirement_Check.py		Requirement_Check.py
VLLM.py		VLLM.py
filenames.npy		filenames.npy
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLLM

Features

Technology Stack

User Interface (Streamlit)

Application with Accuracy:

Demo

Evaluation Metrics

Installation

1. Clone the repository

2. Install dependencies

3. Prepare local dataset

4. Generate FAISS, captions

6. Set your Unsplash API key

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

karthiksuki/VLLM

Folders and files

Latest commit

History

Repository files navigation

VLLM

Features

Technology Stack

User Interface (Streamlit)

Application with Accuracy:

Demo

Evaluation Metrics

Installation

1. Clone the repository

2. Install dependencies

3. Prepare local dataset

4. Generate FAISS, captions

6. Set your Unsplash API key

📚 References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages