A report exploring the role of vector databases in managing high-dimensional data, including the challenges, inner workings and applications in content-based image retrieval, smart cities, and semantic caching for LLMs.
- High-dimensional data and its challenges
- What are vector databases, and how do they differ from traditional databases
- Approximate Nearest Neighbours (ANN) and other common search algorithms
- Vector indexing techniques: Product Quantisation (PQ) and Hierarchical Navigable Small World (HNSW)
- Similarity metrics
- Applications: Content-Based Image Retrieval (image similarity search), smart cities (vehicle peccancy detection), semantic caching for LLMs
- Future directions and opportunities in vector databases
- Code examples using Faiss, Pinecone, CLIP, and Sentence Transformers in Python