This project implements a Visual Language Model (VLM)-powered multi-modal search engine using OpenAI’s CLIP and the COCO dataset. It enables users to retrieve semantically relevant images based on either a text prompt (e.g., “A Panda”) or an example image. The system uses CLIP to embed both images and text into a shared latent space and leverages FAISS for efficient similarity-based indexing and retrieval. Results are presented through an intuitive Gradio web interface, offering an interactive and scalable solution for content-based image search.
- Text-based and image-based image search
- Local dataset (COCO) visual search using FAISS
- Web search using Unsplash API
- Semantic similarity using CLIP & Evaluation metrics
- Responsive, clean Streamlit interface
-
Precision by K calls – Relevance among top-K results
-
Recall by K calls – Coverage of relevant images in top-K
-
mAP – Mean average precision over the returned set
where, K - represent the no of images (K = 6 default)
git clone https://github.com/your-repo/clip-visual-search.git
cd clip-visual-searchMake sure you have Python 3.10+ installed, then run:
pip install -r requirements.txt- Download the COCO dataset
- Download the following files:
train2017images (around 18GB).captions_train2017.json(annotations for the train set).
faiss_clip_index.idx– FAISS index for CLIP image embeddingscaptions.npy– NumPy array of image captions
Go to Unsplash Developers, create an app, and get your Access Key.
Replace the following line in vllm.py with your key:
UNSPLASH_ACCESS_KEY = "YOUR_UNSPLASH_ACCESS_KEY"You're now ready to launch the app with:
streamlit run vllm.py| Reference Number | Title / Source | Link |
|---|---|---|
| [1] | Learning Transferable Visual Models From Natural Language Supervision | https://arxiv.org/pdf/2103.00020 |
| [2] | FAISS: Facebook AI Similarity Search | https://arxiv.org/pdf/2401.08281 |
| [3] | Streamlit Documentation | https://docs.streamlit.io |
| [4] | Unsplash API | https://unsplash.com/developers |
| [5] | scikit-learn: Precision, Recall, mAP Metrics | https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics |
| [6] | COCO: Common Objects in Context Dataset | https://cocodataset.org/#home |