An end-to-end semantic product search system for fashion catalog data.
This monorepo contains:
- A FastAPI embedding service (Python) that generates normalized text embeddings and serves product images
- A Spring Boot GraphQL API (Java) that orchestrates embedding + vector search in Postgres
- An Angular frontend (TypeScript) with a Redux-style store, enterprise UI, and paginated result grid
- Data and ingestion scripts for preprocessing and indexing product embeddings in Postgres + pgvector
Developer Note: This project uses multiple backend layers intentionally to explore and experiment with advanced system design concepts. The structure is meant for learning and exploration, not necessarily optimized for efficiency.
multimodal-search/
|- backend-spring/ # Spring Boot GraphQL API (Java)
| \- multimodal/
|- embedding-HF/ # FastAPI embedding and image service (Python)
| |- app/
| |- preprocess.py
| \- embeddings.py
|- multimodal-frontend/ # Angular app (UI + client state)
|- data/ # styles.csv, preprocessed CSV, images/
|- docker-compose.yml # Postgres + pgvector
\- README.md
Core files:
embedding-HF/app/main.pyembedding-HF/app/model.pyembedding-HF/app/schemas.py
Responsibilities:
POST /embed/text- Receives raw query text
- Uses
sentence-transformers/all-MiniLM-L6-v2 - Returns a normalized embedding vector
GET /image/{item_id}- Serves product image files from
data/imagesby ID (.jpg,.jpeg,.png) - Used by the frontend to render result images
- Serves product image files from
Core files:
backend-spring/multimodal/src/main/resources/graphql/schema.graphqlsbackend-spring/multimodal/src/main/java/com/example/multimodal/controller/SearchController.javabackend-spring/multimodal/src/main/java/com/example/multimodal/service/SearchService.javabackend-spring/multimodal/src/main/java/com/example/multimodal/repository/ProductRepository.java
Responsibilities:
- Exposes GraphQL query:
searchProducts(query: String!, topN: Int!): [Product]
- On each search:
- Calls FastAPI
POST /embed/textto embed the user query - Queries Postgres via
pgvectornearest-neighbor search - Returns ranked products with similarity score
- Calls FastAPI
Core files:
multimodal-frontend/src/app/services/search.tsmultimodal-frontend/src/app/store/search.store.tsmultimodal-frontend/src/app/components/search-bar/*multimodal-frontend/src/app/components/results-grid/*
Responsibilities:
- Calls GraphQL search endpoint via
/api/graphql(proxied to Spring) - Builds image URL per product as
/image/{id}(proxied to FastAPI) - Handles loading/error/results state using a Redux-style local store
- Supports:
- query input
- user-selectable
topN - result cards with image fallback
- client-side pagination
Core files:
data/styles.csvdata/preprocessed_products.csvembedding-HF/preprocess.pyembedding-HF/embeddings.py
Responsibilities:
- Clean and concatenate metadata into
text_for_embedding - Generate embeddings in batches
- Insert into
productstable - Create
ivfflatvector index for faster retrieval
- User submits search query in Angular UI
- Frontend dispatches search action to local store
- Frontend sends GraphQL query to Spring:
POST /api/graphql(dev proxy ->http://localhost:8080/graphql)
- Spring calls FastAPI:
POST http://localhost:8000/embed/text
- Spring runs vector similarity query in Postgres (
pgvector) - Spring returns ranked products + similarity
- Frontend renders cards and requests image URLs:
/image/{id}(dev proxy ->http://localhost:8000/image/{id})
This repo uses a Redux pattern implemented with RxJS (not NgRx package).
Where:
SearchStateis the single app-search state shapeSearchActiondescribes state transitionssearchReducer(state, action)is pure transition logicSearchStoreholds state in aBehaviorSubjectand exposesstate$
State transitions:
SEARCH_REQUEST-> set loading, clear old results, persist current querySEARCH_SUCCESS-> store products, stop loadingSEARCH_FAILURE-> store error, stop loading
This makes UI state deterministic and easy to reason about.
A text embedding maps a string to a dense vector in R^d.
Here:
- model:
all-MiniLM-L6-v2 - dimension: 384
- normalization:
v_norm = v / ||v||_2
Normalization allows cosine similarity to align with dot-product behavior.
Given vectors a and b:
cos_sim(a, b) = (a . b) / (||a|| * ||b||)
With normalized vectors, ranking by cosine similarity is efficient and stable.
The ivfflat pgvector index clusters vectors and searches likely partitions first.
Tradeoff:
- Much faster retrieval at scale
- Slight approximation vs brute-force exact scan
For product search, this is usually the right latency/quality tradeoff.
- Python 3.10+
- Java 17
- Node 20+ (Angular 21 requires modern Node)
- Docker
From repo root:
docker compose up -dDatabase defaults are in:
docker-compose.ymlbackend-spring/multimodal/src/main/resources/application.properties
From embedding-HF:
python preprocess.py
python embeddings.pyThis prepares text and loads embeddings into Postgres.
From embedding-HF:
uvicorn app.main:app --reload --port 8000From backend-spring/multimodal:
./mvnw spring-boot:runRuns on http://localhost:8080.
From multimodal-frontend:
npm install
npm startRuns on http://localhost:4200 with proxy routes:
/api-> Spring/image-> FastAPI
Proxy config:
multimodal-frontend/proxy.conf.json
Query:
query SearchProducts($query: String!, $topN: Int!) {
searchProducts(query: $query, topN: $topN) {
id
productDisplayName
similarity
masterCategory
subCategory
baseColour
}
}-
POST /embed/text- Body:
{ "text": "red dress" } - Response:
{ "embedding": [ ... ] }
- Body:
-
GET /image/{item_id}- Returns image bytes if file exists
- Returns
404if not found
- Retrieval is currently text-embedding driven.
- Images are currently used for result display and can be extended to true image-embedding retrieval later.
- Some scripts/config are still tuned for local development defaults (localhost, fixed creds).
- Add environment-based config for ports, DB credentials, and service URLs.
- Add integration tests that cover the full query -> embedding -> DB -> UI flow.
- Add observability (structured logs + request tracing) across all three services.
- Add optional hybrid ranking (semantic score + keyword/business metadata).