BiteReceipt Chat — food receipts turned into instant insights. Chat about what you ate, when, and how much. 🧾✨
A quick demo of BiteReceipt Chat in action.
The BiteReceipt-Chat system is designed as a containerized, multi-component application orchestrated with Docker Compose. Below is a diagram illustrating the high-level architecture and the interaction between the services.
graph TD
subgraph "User"
U[User]
end
subgraph "Docker Environment"
direction LR
subgraph "BiteReceipt-Chat API"
F[FastAPI Backend]
end
subgraph "Databases"
P[PostgreSQL]
Q[Qdrant Vector DB]
end
subgraph "AI Models"
O[Ollama Service]
end
end
U -- "HTTP Requests (Upload & Chat)" --> F
F -- "Stores & Retrieves structured data" --> P
F -- "Stores & Retrieves embeddings" --> Q
F -- "Receipt parsing & Chat completion (RAG)" --> O
style U fill:#42a5f5,stroke:#333,stroke-width:2px
style F fill:#66bb6a,stroke:#333,stroke-width:2px
style P fill:#26a69a,stroke:#333,stroke-width:2px
style Q fill:#26a69a,stroke:#333,stroke-width:2px
style O fill:#ffa726,stroke:#333,stroke-width:2px
- User: Interacts with the system through a simple web UI.
- FastAPI Backend: The core application logic. It handles HTTP requests, orchestrates the AI and data services, and serves the frontend.
- PostgreSQL: A relational database used to store the structured data extracted from receipts (e.g., items, prices, date).
- Qdrant: A vector database that stores embeddings of the receipt items, enabling efficient similarity searches for the RAG pipeline.
- Ollama Service: Provides the AI models for both computer vision (receipt parsing) and language understanding (chat completion).
The entire system is defined in docker-compose.yml and can be run locally with a single command, ensuring a consistent and reproducible environment.
This project is an answer to the Technical AI Engineer Test.
From docs/AI_KNOWLEDGE.md
- REST API is a classic way for web services to talk: client sends HTTP request (GET, POST, etc.), server replies. It is mostly stateless.
- MCP (Model Context Protocol) is newer (from Anthropic, Nov 2024) and made especially for AI agents. It standardizes how LLMs connect to external tools/data, keeps context/session, allows discovery of tools/resources. (Anthropic)
- With REST API, for each tool or data source you write custom code and maintain endpoints, docs etc. With MCP, many existing APIs can be wrapped so agents can use them without writing new logic every time. (arXiv)
- REST has simpler setup, but MCP gives more flexibility (multi-step tasks, context carrying, dynamic discovery) at cost of more protocol/infra complexity. (Eleks)
- REST allows agents to fetch external data (weather, databases, user info, etc.), do actions, keep responses grounded.
- MCP lets agent see which tools are available, use them dynamically, maintain conversation history/context so agent doesn’t repeat or forget.
- Helps in workflows: agents doing multi-tool chains (e.g. search → summarize → send email), or interacting with many data sources without pre-coding all.
- AutoMCP (tool) shows that many REST APIs (OpenAPI spec) can be transformed automatically to MCP servers, reducing developer work. (arXiv)
- Use reliable, recent data sources rather than only model memory.
- Define tools/APIs clearly (input/output schemas), force agent to check work or show reasoning.
- Test with known test-cases; include edge and failure cases. Human evaluation for hard or subjective tasks.
- Monitor logging, feedback from users, fix errors. If uncertain, allow agent to say “I don’t know.”
- Include safety/security: authenticate tools, restrict dangerous operations, audit MCP servers. (arXiv)
- Defining the same package model + dependencies + tools so same environment works everywhere (local, dev, prod).
- Isolate components (model serving, tool APIs, MCP servers) so they don’t conflict.
- Scale: run multiple container instances; orchestration (Kubernetes etc.).
- Versioning and experiments: different versions of model/container running side by side.
- Security: limit access, isolate privileges, easier updates/patches. (USAII, GitHub)
- Decide your goal: Define the domain, style, or task, and pick metrics to measure success. (arXiv)
- Collect data: Gather domain-specific texts or input/output pairs. Clean the data by removing noise and sensitive information. (Medium, arXiv)
- Preprocess: Tokenize text using the base model's tokenizer, manage long inputs, and handle special tokens. (arXiv)
- Choose base model & infra: Select a suitable base model and set up the necessary compute infrastructure (like GPUs). (Heavybit)
- Fine-tune: Apply a fine-tuning method. This could be supervised fine-tuning, instruction tuning, RLHF, or parameter-efficient methods like LoRA to save on compute.
- Tune & validate: Adjust hyperparameters and validate the model's performance on a separate dataset to avoid overfitting. (arXiv)
- Evaluate: Test the final model on held-out data and use human evaluation for subjective qualities like style and correctness. (Medium)
- Deploy & monitor: Deploy the model carefully, monitor for performance drift and errors, and plan for future updates. (arXiv)
- Check ethics: Continuously consider and address ethical and privacy implications.
- “Introducing the Model Context Protocol,” Anthropic, Nov. 2024. (Anthropic)
- “Model Context Protocol,” Wikipedia. (Wikipedia)
- “Model Context Protocol (MCP) vs. APIs: The New Standard for AI Integration” Tahir Balarabe, Medium, May 2025 (Medium)
- “What is MCP and AI agents? How does it compare to REST API’s?” Tallyfy, Apr. 2025 (Tallyfy)
- "Model Context Protocol vs. REST API" Eleks. (Eleks)
- "Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs" arXiv, Jul. 2025 (arXiv)
- "Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers" arXiv, Jun. 2025 (arXiv)
- "The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities (Version 1.0)" arXiv, Aug. 2024 (arXiv)
- "The Comprehensive Guide to Fine-tuning LLM | by Sunil Rao" Medium, Jun. 2025 (Medium)
- "LLM Fine-Tuning: A Guide for Engineering Teams in 2025" Heavybit, Jun. 2025 (Heavybit)
- "Deploy Machine Learning Models with Docker: A Practical Perspective," U.S. Artificial Intelligence Institute. (USAII)
- "Docker-Guide-for-AI-Model-Development-and-Deployment," GitHub. (GitHub)
This project provides a complete solution to the coding test.
The analysis of the customers-100000.csv file is provided in the Jupyter notebook located at docs/customers-notebook.ipynb. The notebook uses the pandas library to load the CSV file into a DataFrame and then performs a detailed analysis, including:
- Data shape and basic statistics.
- Data types and missing values.
- Unique value counts.
- Top 5 most frequent countries and cities.
- Visualizations of the data.
The docs/customers-notebook.ipynb also demonstrates how to parse the large customers-2000000.csv file with low memory usage. The approach is to process the file in chunks using the chunksize parameter in pandas.read_csv(). This allows us to process the file piece by piece without loading the entire file into memory at once.
The notebook explains the difference in detail. In summary:
- Small Files: For small files that can fit into memory, we can load the entire file at once into a pandas DataFrame. This is simple and allows for immediate analysis of the entire dataset.
- Large Files: For large files that are larger than the available RAM, we must use a chunking strategy. We read and process the file in smaller, manageable chunks to keep memory usage low.
4. Deploy the vector DB on your own, and implement the vector cosine similarity without using a high level library.
- Vector DB Deployment: The project uses Qdrant as the vector database. It is deployed as a service in the
docker-compose.ymlfile. Theapiservice connects to the Qdrant container to store and search for vector embeddings. - Cosine Similarity Implementation: The cosine similarity is implemented from scratch in the
api/app.pyfile within thecosinefunction. It usesnumpyfor the underlying calculations but does not use a high-level library likescikit-learnfor the cosine similarity calculation itself. The implementation is tested inapi/tests/test_cosine.py.
The project includes a complete platform for receipt analysis:
- UI: A simple UI is provided in
api/static/index.html. It allows users to upload a receipt image and chat with the AI assistant. The UI is served by the FastAPI backend. - Backend: The FastAPI backend in
api/app.pyhandles the following:- a. Upload food online receipt: The
/ingestendpoint accepts image uploads. - b. Extract with computer vision: It uses the
qwen2.5vl:3bmodel via Ollama to extract structured data from the receipt. - c. Get the insight on the receipt, and store it to DB: The extracted information is stored in a PostgreSQL database (schema defined in
migrations/001_init.sql) and the embeddings are stored in Qdrant. - d. Design and implement the AI tools: The
/chatendpoint implements a RAG pipeline. It takes a user's question, finds relevant items from the vector database, and then uses an LLM to generate an answer based on the retrieved context. This allows the LLM to answer questions like "What food did I buy yesterday?".
- a. Upload food online receipt: The
The application is fully containerized.
- Dockerfile: The
api/Dockerfiledefines the container image for the FastAPI backend. - Docker Compose: The
docker-compose.ymlfile orchestrates all the services, including the API, PostgreSQL, Qdrant, and Ollama. To run the application locally, you can use thescripts/download.shscript to set up the environment and then rundocker-compose up -d.
A CI/CD pipeline is defined in .github/workflows/build.yml. This GitHub Actions workflow automatically triggers on a push to the main branch. It builds the Docker image for the API and pushes it to the GitHub Container Registry.
This project can be run in an offline environment after an initial online setup to download the necessary Docker images and AI models.
While you have an internet connection, run the download.sh script. This will pull all the required Docker images and download the Ollama models into a persistent Docker volume.
./scripts/download.shOnce the online setup is complete, you can disconnect from the network.
To start the application, run:
docker compose up -dThis will start all the necessary services (API, PostgreSQL, Qdrant, and Ollama) in detached mode.
You can interact with the application through the web UI or the API.
Open your web browser and navigate to http://localhost:8000. You can upload receipts and chat with the AI assistant through the UI.
You can also interact with the API directly using curl or any other API client.
To upload a receipt, send a POST request to the /ingest endpoint with the image file:
curl -X POST -F "file=@/path/to/your/receipt.jpg" http://localhost:8000/ingestReplace /path/to/your/receipt.jpg with the actual path to your receipt image.
To chat about a receipt, send a POST request to the /chat endpoint. You'll need the receipt_id that was returned when you ingested the receipt.
curl -X POST -H "Content-Type: application/json" \
-d '{"question": "What was the total amount?", "receipt_id": "your_receipt_id"}' \
http://localhost:8000/chatReplace your_receipt_id with the actual ID of the receipt you want to ask about.
