Skip to content

BiteReceipt Chat — food receipts turned into instant insights. Chat about what you ate, when, and how much. 🧾✨

License

Notifications You must be signed in to change notification settings

ari-a-dev/BiteReceipt-Chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BiteReceipt-Chat

BiteReceipt Chat — food receipts turned into instant insights. Chat about what you ate, when, and how much. 🧾✨

BiteReceipt Chat Demo

A quick demo of BiteReceipt Chat in action.

System Architecture

The BiteReceipt-Chat system is designed as a containerized, multi-component application orchestrated with Docker Compose. Below is a diagram illustrating the high-level architecture and the interaction between the services.

graph TD
    subgraph "User"
        U[User]
    end

    subgraph "Docker Environment"
        direction LR
        subgraph "BiteReceipt-Chat API"
            F[FastAPI Backend]
        end
        subgraph "Databases"
            P[PostgreSQL]
            Q[Qdrant Vector DB]
        end
        subgraph "AI Models"
            O[Ollama Service]
        end
    end

    U -- "HTTP Requests (Upload & Chat)" --> F
    F -- "Stores & Retrieves structured data" --> P
    F -- "Stores & Retrieves embeddings" --> Q
    F -- "Receipt parsing & Chat completion (RAG)" --> O

    style U fill:#42a5f5,stroke:#333,stroke-width:2px
    style F fill:#66bb6a,stroke:#333,stroke-width:2px
    style P fill:#26a69a,stroke:#333,stroke-width:2px
    style Q fill:#26a69a,stroke:#333,stroke-width:2px
    style O fill:#ffa726,stroke:#333,stroke-width:2px
Loading

Components

  • User: Interacts with the system through a simple web UI.
  • FastAPI Backend: The core application logic. It handles HTTP requests, orchestrates the AI and data services, and serves the frontend.
  • PostgreSQL: A relational database used to store the structured data extracted from receipts (e.g., items, prices, date).
  • Qdrant: A vector database that stores embeddings of the receipt items, enabling efficient similarity searches for the RAG pipeline.
  • Ollama Service: Provides the AI models for both computer vision (receipt parsing) and language understanding (chat completion).

The entire system is defined in docker-compose.yml and can be run locally with a single command, ensuring a consistent and reproducible environment.

AI Engineer Test

This project is an answer to the Technical AI Engineer Test.

Engineering Knowledge AI Agent Test

From docs/AI_KNOWLEDGE.md

1. Describe differences between REST API, MCP in the context of AI.

  • REST API is a classic way for web services to talk: client sends HTTP request (GET, POST, etc.), server replies. It is mostly stateless.
  • MCP (Model Context Protocol) is newer (from Anthropic, Nov 2024) and made especially for AI agents. It standardizes how LLMs connect to external tools/data, keeps context/session, allows discovery of tools/resources. (Anthropic)
  • With REST API, for each tool or data source you write custom code and maintain endpoints, docs etc. With MCP, many existing APIs can be wrapped so agents can use them without writing new logic every time. (arXiv)
  • REST has simpler setup, but MCP gives more flexibility (multi-step tasks, context carrying, dynamic discovery) at cost of more protocol/infra complexity. (Eleks)

2. How REST API, MCP, can improve the AI use case.

  • REST allows agents to fetch external data (weather, databases, user info, etc.), do actions, keep responses grounded.
  • MCP lets agent see which tools are available, use them dynamically, maintain conversation history/context so agent doesn’t repeat or forget.
  • Helps in workflows: agents doing multi-tool chains (e.g. search → summarize → send email), or interacting with many data sources without pre-coding all.
  • AutoMCP (tool) shows that many REST APIs (OpenAPI spec) can be transformed automatically to MCP servers, reducing developer work. (arXiv)

3. How do you ensure that your AI agent answers correctly?

  • Use reliable, recent data sources rather than only model memory.
  • Define tools/APIs clearly (input/output schemas), force agent to check work or show reasoning.
  • Test with known test-cases; include edge and failure cases. Human evaluation for hard or subjective tasks.
  • Monitor logging, feedback from users, fix errors. If uncertain, allow agent to say “I don’t know.”
  • Include safety/security: authenticate tools, restrict dangerous operations, audit MCP servers. (arXiv)

4. Describe what can you do with Docker / Containerize environment in the context of AI

  • Defining the same package model + dependencies + tools so same environment works everywhere (local, dev, prod).
  • Isolate components (model serving, tool APIs, MCP servers) so they don’t conflict.
  • Scale: run multiple container instances; orchestration (Kubernetes etc.).
  • Versioning and experiments: different versions of model/container running side by side.
  • Security: limit access, isolate privileges, easier updates/patches. (USAII, GitHub)

5. How do you finetune the LLM model from raw ?

  • Decide your goal: Define the domain, style, or task, and pick metrics to measure success. (arXiv)
  • Collect data: Gather domain-specific texts or input/output pairs. Clean the data by removing noise and sensitive information. (Medium, arXiv)
  • Preprocess: Tokenize text using the base model's tokenizer, manage long inputs, and handle special tokens. (arXiv)
  • Choose base model & infra: Select a suitable base model and set up the necessary compute infrastructure (like GPUs). (Heavybit)
  • Fine-tune: Apply a fine-tuning method. This could be supervised fine-tuning, instruction tuning, RLHF, or parameter-efficient methods like LoRA to save on compute.
  • Tune & validate: Adjust hyperparameters and validate the model's performance on a separate dataset to avoid overfitting. (arXiv)
  • Evaluate: Test the final model on held-out data and use human evaluation for subjective qualities like style and correctness. (Medium)
  • Deploy & monitor: Deploy the model carefully, monitor for performance drift and errors, and plan for future updates. (arXiv)
  • Check ethics: Continuously consider and address ethical and privacy implications.

References

  1. “Introducing the Model Context Protocol,” Anthropic, Nov. 2024. (Anthropic)
  2. “Model Context Protocol,” Wikipedia. (Wikipedia)
  3. “Model Context Protocol (MCP) vs. APIs: The New Standard for AI Integration” Tahir Balarabe, Medium, May 2025 (Medium)
  4. “What is MCP and AI agents? How does it compare to REST API’s?” Tallyfy, Apr. 2025 (Tallyfy)
  5. "Model Context Protocol vs. REST API" Eleks. (Eleks)
  6. "Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs" arXiv, Jul. 2025 (arXiv)
  7. "Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers" arXiv, Jun. 2025 (arXiv)
  8. "The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities (Version 1.0)" arXiv, Aug. 2024 (arXiv)
  9. "The Comprehensive Guide to Fine-tuning LLM | by Sunil Rao" Medium, Jun. 2025 (Medium)
  10. "LLM Fine-Tuning: A Guide for Engineering Teams in 2025" Heavybit, Jun. 2025 (Heavybit)
  11. "Deploy Machine Learning Models with Docker: A Practical Perspective," U.S. Artificial Intelligence Institute. (USAII)
  12. "Docker-Guide-for-AI-Model-Development-and-Deployment," GitHub. (GitHub)

Coding Test

This project provides a complete solution to the coding test.

1. Please parse this csv, get the insight of the data customer-100000.csv

The analysis of the customers-100000.csv file is provided in the Jupyter notebook located at docs/customers-notebook.ipynb. The notebook uses the pandas library to load the CSV file into a DataFrame and then performs a detailed analysis, including:

  • Data shape and basic statistics.
  • Data types and missing values.
  • Unique value counts.
  • Top 5 most frequent countries and cities.
  • Visualizations of the data.

2. Please parse large CSV, customers-2000000.csv and keep the memory low.

The docs/customers-notebook.ipynb also demonstrates how to parse the large customers-2000000.csv file with low memory usage. The approach is to process the file in chunks using the chunksize parameter in pandas.read_csv(). This allows us to process the file piece by piece without loading the entire file into memory at once.

3. Explain how it's different from splitting the small vs large files.

The notebook explains the difference in detail. In summary:

  • Small Files: For small files that can fit into memory, we can load the entire file at once into a pandas DataFrame. This is simple and allows for immediate analysis of the entire dataset.
  • Large Files: For large files that are larger than the available RAM, we must use a chunking strategy. We read and process the file in smaller, manageable chunks to keep memory usage low.

4. Deploy the vector DB on your own, and implement the vector cosine similarity without using a high level library.

  • Vector DB Deployment: The project uses Qdrant as the vector database. It is deployed as a service in the docker-compose.yml file. The api service connects to the Qdrant container to store and search for vector embeddings.
  • Cosine Similarity Implementation: The cosine similarity is implemented from scratch in the api/app.py file within the cosine function. It uses numpy for the underlying calculations but does not use a high-level library like scikit-learn for the cosine similarity calculation itself. The implementation is tested in api/tests/test_cosine.py.

5. Create a platform with UI

The project includes a complete platform for receipt analysis:

  • UI: A simple UI is provided in api/static/index.html. It allows users to upload a receipt image and chat with the AI assistant. The UI is served by the FastAPI backend.
  • Backend: The FastAPI backend in api/app.py handles the following:
    • a. Upload food online receipt: The /ingest endpoint accepts image uploads.
    • b. Extract with computer vision: It uses the qwen2.5vl:3b model via Ollama to extract structured data from the receipt.
    • c. Get the insight on the receipt, and store it to DB: The extracted information is stored in a PostgreSQL database (schema defined in migrations/001_init.sql) and the embeddings are stored in Qdrant.
    • d. Design and implement the AI tools: The /chat endpoint implements a RAG pipeline. It takes a user's question, finds relevant items from the vector database, and then uses an LLM to generate an answer based on the retrieved context. This allows the LLM to answer questions like "What food did I buy yesterday?".

e. Wrap your application into a container image using docker, and run it in your local.

The application is fully containerized.

  • Dockerfile: The api/Dockerfile defines the container image for the FastAPI backend.
  • Docker Compose: The docker-compose.yml file orchestrates all the services, including the API, PostgreSQL, Qdrant, and Ollama. To run the application locally, you can use the scripts/download.sh script to set up the environment and then run docker-compose up -d.

f. Write CI/CD to wrap your application into the container.

A CI/CD pipeline is defined in .github/workflows/build.yml. This GitHub Actions workflow automatically triggers on a push to the main branch. It builds the Docker image for the API and pushes it to the GitHub Container Registry.

How to Run Offline

This project can be run in an offline environment after an initial online setup to download the necessary Docker images and AI models.

1. Online Setup (Once)

While you have an internet connection, run the download.sh script. This will pull all the required Docker images and download the Ollama models into a persistent Docker volume.

./scripts/download.sh

2. Run Offline

Once the online setup is complete, you can disconnect from the network.

To start the application, run:

docker compose up -d

This will start all the necessary services (API, PostgreSQL, Qdrant, and Ollama) in detached mode.

3. Interact with the Application

You can interact with the application through the web UI or the API.

Web UI

Open your web browser and navigate to http://localhost:8000. You can upload receipts and chat with the AI assistant through the UI.

API

You can also interact with the API directly using curl or any other API client.

Ingest a Receipt

To upload a receipt, send a POST request to the /ingest endpoint with the image file:

curl -X POST -F "file=@/path/to/your/receipt.jpg" http://localhost:8000/ingest

Replace /path/to/your/receipt.jpg with the actual path to your receipt image.

Chat about a Receipt

To chat about a receipt, send a POST request to the /chat endpoint. You'll need the receipt_id that was returned when you ingested the receipt.

curl -X POST -H "Content-Type: application/json" \
-d '{"question": "What was the total amount?", "receipt_id": "your_receipt_id"}' \
http://localhost:8000/chat

Replace your_receipt_id with the actual ID of the receipt you want to ask about.

About

BiteReceipt Chat — food receipts turned into instant insights. Chat about what you ate, when, and how much. 🧾✨

Resources

License

Stars

Watchers

Forks

Packages

No packages published