🛒 Semantic Search System for Supermarket Inventory

A context-aware product search system that understands user intent beyond keyword matching using vector embeddings and semantic similarity.

📋 Table of Contents

Overview
Features
Architecture
Prerequisites
Quick Start
Project Structure
Configuration
API Documentation
Development
Troubleshooting

🎯 Overview

This system enables semantic product search that understands:

Contextual meaning: "baking supplies" → finds flour, oven trays, mixers
Synonyms: "soda" ↔ "soft drink", "pop", "carbonated beverage"
Conceptual relationships: "Italian dinner" → pasta, tomatoes, wine

Unlike traditional keyword search, semantic search uses AI-powered vector embeddings to match products based on meaning and context.

✨ Features

🔍 Semantic Search: Vector-based similarity search using 1024-dimensional embeddings
🚀 Fast Performance: Query response time <800ms
📊 High Relevance: >70% accuracy in top 5 results
🖼️ Product Images: Auto-generated images matching product descriptions
🎨 Modern UI: Clean, responsive frontend with real-time search
🐳 Docker Support: Easy deployment with Docker Compose
🔄 GraphQL API: Flexible query interface

🏗️ Architecture

System Overview

┌─────────────────────────────────────────────────────────┐
│                    Frontend (Vanilla JS)                 │
│  • Search UI                                             │
│  • GraphQL Client                                        │
│  • Real-time Results                                     │
└──────────────────────┬──────────────────────────────────┘
                       │ HTTP/GraphQL
                       │
┌──────────────────────▼──────────────────────────────────┐
│              Backend (NestJS + GraphQL)                  │
│  ┌────────────────────────────────────────────────────┐ │
│  │  GraphQL API (Apollo Server)                       │ │
│  │  • Search Resolvers                                │ │
│  │  • Product Resolvers                               │ │
│  └────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Business Logic                                    │ │
│  │  • Search Service (Vector Similarity)              │ │
│  │  • Embeddings Service (LLM Integration)            │ │
│  │  • Products Service                                │ │
│  └────────────────────────────────────────────────────┘ │
└──────────┬──────────────────────────┬───────────────────┘
           │                          │
           │                          │
┌──────────▼──────────┐   ┌──────────▼──────────┐
│   PostgreSQL        │   │   LLM API            │
│   + pgvector        │   │   (Local/OpenAI)     │
│                     │   │                      │
│  • Products Table   │   │  • Embeddings        │
│  • Vector Index     │   │  • 1024 dimensions  │
│  • Similarity Query │   │  • Batch Processing │
└─────────────────────┘   └──────────────────────┘

How Semantic Search Works

Query Processing: User enters a search query (e.g., "pizza")
Embedding Generation: Query is converted to a 1024-dimensional vector using LLM
Vector Similarity: Database searches for products with similar embeddings using cosine similarity
Results Ranking: Products are ranked by similarity score (0-1)
Response: Top matching products returned with similarity scores

Technology Stack

Frontend: Vanilla JavaScript, HTML5, CSS3
Backend: NestJS, GraphQL (Apollo Server), TypeScript
Database: PostgreSQL 15 with pgvector extension
ML/AI: Local LLM (OpenAI-compatible API) for embeddings
Infrastructure: Docker, Docker Compose

📦 Prerequisites

Before you begin, ensure you have the following installed:

Node.js (v18 or higher) and npm
Docker and Docker Compose
Python 3 (for simple HTTP server, optional)
Local LLM API running on http://localhost:1234/v1 (or configure your own)

LLM Setup

This project requires an OpenAI-compatible API for generating embeddings. Options:

Local LLM (Recommended for development):
- Use LM Studio, Ollama, or similar
- Configure to run on http://localhost:1234/v1
- Model: text-embedding-qwen3-embedding-0.6b (1024 dimensions)
OpenAI API:
- Set OPENAI_API_KEY in .env
- Uses OpenAI's embedding models

🚀 First Time Setup

Follow these steps when setting up the project on a new machine for the first time.

Prerequisites Check

Before starting, ensure you have:

✅ Node.js 18+ installed (node --version)
✅ npm installed (npm --version)
✅ Docker Desktop installed and running (docker --version)
✅ Local LLM API running (LM Studio, Ollama, etc.) on http://localhost:1234/v1

Step 1: Clone the Repository

git clone <repository-url>
cd "Sematic Search"

Step 2: Install Dependencies

# Install backend dependencies
cd backend
npm install

# Install scripts dependencies
cd ../scripts
npm install

Step 3: Configure Environment

Create your .env file from the template:

cd ..
cp .env.example .env

Important: Edit .env and ensure your LLM API endpoint is correct:

# For local LLM (LM Studio, Ollama, etc.)
LLM_API_ENDPOINT=http://localhost:1234/v1
EMBEDDING_MODEL=text-embedding-qwen3-embedding-0.6b
EMBEDDING_DIMENSIONS=1024

# For OpenAI API (optional)
# OPENAI_API_KEY=sk-your-actual-key-here

Step 4: Start Database

Start PostgreSQL with Docker:

docker-compose -f docker/docker-compose.yml up -d postgres

Wait 10-15 seconds for PostgreSQL to fully initialize. Verify it's running:

docker ps | grep semantic-search-db

Step 5: Verify LLM API is Running

Before generating embeddings, ensure your LLM API is accessible:

curl http://localhost:1234/v1/models

You should see a list of available models. If not, start your LLM service (LM Studio, Ollama, etc.).

Step 6: Generate and Seed Data

Generate product data and embeddings:

cd scripts

# Step 6a: Generate mock product data (1000 products)
node generate-data.js

# Step 6b: Generate embeddings (this may take a few minutes)
# Make sure LLM API is running before this step!
node generate-embeddings.js

# Step 6c: Seed the database
node seed-database.js

Expected output:

✅ Generated 1000 products
✅ Generated embeddings for 1000 products
✅ Successfully seeded 1000 products

Step 7: Start Backend

In a new terminal window:

cd backend
PORT=3001 npm run start:dev

Wait for the message: 🚀 Application is running on: http://localhost:3001/graphql

Step 8: Start Frontend

In another new terminal window:

cd frontend
python3 -m http.server 8080

Or using Node.js:

npx http-server -p 8080

Step 9: Open Application

Open your browser and navigate to: http://localhost:8080

Step 10: Test the Application

Type "pizza" in the search box
Click "Search" or press Enter
You should see products with similarity scores and images

Test GraphQL API directly:

curl -X POST http://localhost:3001/graphql \
  -H 'Content-Type: application/json' \
  -d '{"query":"query { searchProducts(query: \"pizza\", limit: 3) { similarity product { name image_url } } }"}'

✅ Setup Complete!

You should now have:

✅ Database running with 1000 products
✅ Backend API running on port 3001
✅ Frontend running on port 8080
✅ Semantic search working

🚀 Quick Start (After First Setup)

Once you've completed the first-time setup, you can quickly start the project:

# 1. Start database
docker-compose -f docker/docker-compose.yml up -d postgres

# 2. Start backend (in one terminal)
cd backend && PORT=3001 npm run start:dev

# 3. Start frontend (in another terminal)
cd frontend && python3 -m http.server 8080

📁 Project Structure

Sematic Search/
├── backend/                 # NestJS backend application
│   ├── src/
│   │   ├── config/         # Configuration modules
│   │   ├── database/       # Database setup and migrations
│   │   ├── products/       # Product entity and resolvers
│   │   ├── search/         # Search service and resolvers
│   │   └── main.ts         # Application entry point
│   ├── package.json
│   └── tsconfig.json
│
├── frontend/                # Vanilla JS frontend
│   ├── index.html          # Main HTML file
│   ├── css/
│   │   └── styles.css      # Styling
│   └── js/
│       ├── app.js          # Application entry
│       ├── graphql-client.js # GraphQL client
│       └── search-ui.js    # Search UI controller
│
├── scripts/                 # Data generation and seeding
│   ├── generate-data.js    # Generate mock products
│   ├── generate-embeddings.js # Generate embeddings
│   ├── seed-database.js    # Seed database
│   └── package.json
│
├── docker/                  # Docker configuration
│   ├── docker-compose.yml  # Service orchestration
│   └── Dockerfile.backend  # Backend Dockerfile
│
├── data/                    # Generated data (gitignored)
│   ├── products.json
│   └── products-with-embeddings.json
│
├── .env.example            # Environment variables template
├── .gitignore
└── README.md               # This file

⚙️ Configuration

Environment Variables

Variable	Description	Default
`OPENAI_API_KEY`	OpenAI API key (if using OpenAI)	-
`LLM_MODEL`	LLM model name	`qwen/qwen3-vl-4b`
`EMBEDDING_MODEL`	Embedding model name	`text-embedding-qwen3-embedding-0.6b`
`EMBEDDING_DIMENSIONS`	Embedding vector dimensions	`1024`
`LLM_API_ENDPOINT`	LLM API endpoint	`http://localhost:1234/v1`
`DATABASE_URL`	PostgreSQL connection string	`postgresql://postgres:postgres@localhost:5432/semantic_search`
`PORT`	Backend port	`3001` (dev) or `3000` (prod)

Database Schema

CREATE TABLE products (
  id UUID PRIMARY KEY,
  name VARCHAR(255),
  description TEXT,
  category VARCHAR(100),
  price DECIMAL(10,2),
  image_url VARCHAR(500),
  embedding vector(1024),  -- pgvector extension
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

CREATE INDEX products_embedding_idx ON products
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

📡 API Documentation

GraphQL Endpoint

URL: http://localhost:3001/graphql

Search Products

Search for products using semantic similarity:

query {
  searchProducts(query: "pizza", limit: 5) {
    similarity
    product {
      id
      name
      description
      category
      price
      image_url
    }
  }
}

Example Response:

{
  "data": {
    "searchProducts": [
      {
        "similarity": 0.85,
        "product": {
          "id": "...",
          "name": "Elegant Fresh Pizza",
          "description": "...",
          "category": "Snacks",
          "price": 68.0,
          "image_url": "https://source.unsplash.com/400x300/?pizza"
        }
      }
    ]
  }
}

Get Products by Category

query {
  products(category: "Beverages", limit: 10) {
    id
    name
    category
    price
    image_url
  }
}

GraphQL Playground

Access the interactive GraphQL Playground at: http://localhost:3001/graphql

🛠️ Development

Running in Development Mode

Backend (with hot reload):
```
cd backend
PORT=3001 npm run start:dev
```
Frontend (with live reload):
```
cd frontend
python3 -m http.server 8080
```

Regenerating Data

To regenerate product data and embeddings:

cd scripts

# Clear and regenerate
node generate-data.js
node generate-embeddings.js
node seed-database.js

Database Migrations

Migrations are automatically applied when the database container starts. Manual application:

docker exec semantic-search-db psql -U postgres -d semantic_search -f /docker-entrypoint-initdb.d/001_init_schema.sql

Using Docker Compose (All Services)

docker-compose -f docker/docker-compose.yml up

This starts:

PostgreSQL on port 5432
Backend on port 3000
Frontend on port 8080

🐛 Troubleshooting

Backend won't start (Port in use)

# Kill process on port 3001
lsof -ti:3001 | xargs kill -9

# Or use the helper script
./scripts/start-backend.sh

Database connection errors

# Check if Docker is running
docker ps

# Check database logs
docker logs semantic-search-db

# Verify database is ready
docker exec semantic-search-db psql -U postgres -d semantic_search -c "SELECT 1;"

Empty search results

Check embeddings exist:

docker exec semantic-search-db psql -U postgres -d semantic_search -c "SELECT COUNT(*) FROM products WHERE embedding IS NOT NULL;"

Verify LLM API is running:
```
curl http://localhost:1234/v1/models
```
Check dimension mismatch:
- Ensure EMBEDDING_DIMENSIONS in .env matches model output (1024)
- Verify database column type: vector(1024)

CORS errors

Ensure backend CORS is configured in backend/src/main.ts
Use HTTP server for frontend (not file://)
Check backend is running on correct port

Images not loading

Images use Unsplash Source API
Check internet connection
Verify image_url field exists in database

📊 Performance Metrics

Query Response Time: <800ms (average)
Embedding Generation: ~100ms per query
Vector Search: ~50ms for 1000 products
Similarity Accuracy: >70% in top 5 results

🔒 Security Notes

Never commit .env file to version control
Use environment variables for sensitive data
In production, restrict CORS origins
Use HTTPS for API endpoints
Implement rate limiting for production use

📝 License

MIT License - see LICENSE file for details

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📧 Support

For issues and questions, please open an issue on GitHub.

Built with ❤️ using NestJS, GraphQL, PostgreSQL, and Vector Embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
config		config
docker		docker
frontend		frontend
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Niseel/Sematic-Search

Folders and files

Latest commit

History

Repository files navigation