This application provides a backend API for a chatbot, featuring document indexing, querying capabilities, and file uploads. It uses FastAPI for the web framework and Qdrant as the vector database for semantic search.
- Features
- Project Structure
- Prerequisites
- Setup Instructions
- 1. Clone the Repository
- 2. Create and Activate a Virtual Environment
- 3. Install Dependencies
- 4. Configure Environment Variables
- 5. Set Up Qdrant
- 6. Set Up Hugging Face Text Embeddings Inference (TEI) Server
- 7. Set Up Hugging Face Text Generation Inference (TGI) Server
- 8. Deploying with Docker Compose
- Running the Application (Locally, without Docker Compose for the app)
- Reindexing Data
- Data Formats
- API Endpoints
- FastAPI Backend: Modern, fast (high-performance) web framework for building APIs.
- Qdrant Vector Database: Efficient similarity search for document retrieval.
- Hugging Face Embeddings: Supports various embedding providers, including self-hosted TEI.
- Data Reindexing: CLI command to reindex FAQ and web data.
- File Uploads: Endpoint for uploading and processing files (details to be implemented).
.
├── app/ # Main application module
│ ├── __init__.py
│ ├── database.py # Qdrant database interaction, reindexing logic
│ ├── envs.py # Environment variable settings
│ ├── main.py # FastAPI application entry point, CLI for reindex
│ ├── models/ # Pydantic models for API requests/responses
│ │ ├── embedding.py
│ │ ├── query.py
│ │ └── upload.py
│ ├── routers/ # API routers
│ │ ├── query.py
│ │ └── upload.py
│ └── utils/ # Utility modules
│ ├── embedders.py # Embedding component setup
│ ├── llm.py # LLM interaction (e.g., for paraphrasing)
│ └── pipelines.py # Haystack query pipelines
├── data/ # Sample data files (you might need to create this)
│ ├── .gitignore
│ └── (example: hcmut_data_faq.csv)
│ └── (example: hcmut_data_web.json)
├── .env.example # Example environment variables
├── .gitignore
├── Dockerfile # Dockerfile for the main application
├── docker-compose.yml # Docker Compose configuration
├── README.md # This file
└── requirements.txt # Python dependencies
- Python 3.8+
- Pip (Python package installer)
- Docker and Docker Compose (for Qdrant, TEI server, and application deployment)
- Access to an OpenAI API key (if using OpenAI models) or Hugging Face API key (if using HF Inference API for non-self-hosted TEI)
git clone https://github.com/KenKout/hcmut-chatbot
cd hcmut-chatbotIt's highly recommended to use a virtual environment to manage project dependencies.
Linux/macOS:
python3 -m venv venv
source venv/bin/activateWindows:
python -m venv venv
.\venv\Scripts\activatepip install -r requirements.txtCopy the example environment file and update it with your specific configurations:
cp .env.example .envRefer to the comments in /.env.example for detailed explanations of each variable.
If DEBUG=true, Qdrant runs in-memory, and no external setup is needed.
For production (DEBUG=false), you need a running Qdrant instance. You can run Qdrant using Docker:
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrantThis command mounts a local directory qdrant_storage to persist Qdrant data.
Ensure your QDRANTDB_URL in the .env file points to this instance (e.g., http://localhost:6333).
If you choose EMBEDDING_PROVIDER="huggingface" and EMBEDDING_HUGGINGFACE_API_TYPE="text_embeddings_inference", you need to run a TEI server. This allows you to self-host open-source embedding models.
Example using Docker for bkai-foundation-models/vietnamese-bi-encoder (CPU):
docker run -p 8080:80 \
--pull always \
ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
--model-id bkai-foundation-models/vietnamese-bi-encoderFor GPU support (requires NVIDIA drivers and NVIDIA Container Toolkit):
docker run -p 8080:80 --gpus all \
--pull always \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id bkai-foundation-models/vietnamese-bi-encoder- The TEI server will be available at
http://localhost:8080. - Update
EMBEDDING_HUGGINGFACE_BASE_URL="http://localhost:8080"in your.envfile. - The
SentenceTransformersDocumentEmbedderin Haystack can then use this TEI server by configuring itsapi_typetotext_embeddings_inferenceand providing theapi_key(if your TEI server is secured, though the default Docker command does not set an API key) andurl. Our application'sapp/utils/embedders.pyhandles this configuration based on environment variables.
You can replace bkai-foundation-models/vietnamese-bi-encoder with any other Sentence Transformer model compatible with TEI. Check the Text Embeddings Inference documentation for more models and advanced configurations.
If you want to self-host a Large Language Model (LLM) for tasks like paraphrasing or generation, you can use Hugging Face's Text Generation Inference (TGI). The application can be configured to use a TGI endpoint as an OpenAI-compatible API.
Example using Docker (CPU):
# Replace your-model-id with the desired Hugging Face model, e.g., gpt2 or a larger model
# Ensure the model is compatible with TGI.
docker run -p 8081:80 --pull always \
-v $(pwd)/tgi_cache:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id your-model-id --port 80For GPU support (requires NVIDIA drivers and NVIDIA Container Toolkit):
docker run -p 8081:80 --gpus all --pull always \
-v $(pwd)/tgi_cache:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id your-model-id --port 80
# Add other TGI flags as needed, e.g., --num-shard, --quantize- The TGI server will be available at
http://localhost:8081(or the host port you map). - Update
LLM_OPENAI_BASE_URL="http://localhost:8081/v1"in your.envfile if you are running TGI locally and want the app to use it. The/v1path is often used for OpenAI compatibility. - Set
LLM_MODEL_IDin your.envto the model ID you are serving with TGI (this is for reference, the actual model served is determined by the TGI command). - TGI can serve models in an OpenAI-compatible way. Refer to the TGI documentation for details on compatible models and advanced configurations (like quantization, sharding for large models, etc.).
The easiest way to run the entire stack (application, Qdrant, TEI server, and TGI server) is using Docker Compose.
-
Ensure Docker and Docker Compose are installed.
-
Configure Environment Variables: Make sure your
.envfile is correctly set up as described in Step 4. Thedocker-compose.ymlfile will use these variables.QDRANTDB_URLshould behttp://qdrant_db:6333(service name fromdocker-compose.yml).EMBEDDING_HUGGINGFACE_BASE_URLshould behttp://tei_server:80(service name fromdocker-compose.yml).LLM_OPENAI_BASE_URLshould behttp://tgi_server:80/v1(or your TGI service name and port, with/v1for OpenAI compatibility if TGI is configured for it) if you are using the self-hosted TGI server.EMBEDDING_MODELin.envwill be used by thetei_serverindocker-compose.yml.LLM_MODEL_IDin.envwill be used by thetgi_serverindocker-compose.yml(ensure this model is compatible and suitable for your TGI setup).TGI_HTTP_PORTin.envcan be used to configure the host port for the TGI service (e.g.,TGI_HTTP_PORT=8081).- Update your
.envfile accordingly.
-
Build and Run the Services:
docker-compose up --build
To run in detached mode (in the background):
docker-compose up --build -d
-
Accessing the Application:
- The API will be available at
http://<APP_HOST>:<APP_PORT>(e.g.,http://localhost:8000ifAPP_HOST=0.0.0.0andAPP_PORT=8000in your.env). - Qdrant will be accessible on the host at
http://localhost:6333. - The TEI server will be accessible on the host at
http://localhost:8080. - The TGI server will be accessible on the host at
http://localhost:${TGI_HTTP_PORT:-8081}(e.g.,http://localhost:8081).
- The API will be available at
-
Stopping the Services:
docker-compose down
To remove volumes (Qdrant data, TEI cache) as well:
docker-compose down -v
Note on Reindexing with Docker Compose:
If you need to reindex data while using Docker Compose, you can execute the reindex command inside the running app container:
docker-compose exec app python -m app.main --reindexYou will be prompted to enter file paths. These paths should be accessible from within the container's filesystem. If your data files are in the data/ directory and this directory is mounted as a volume (as configured in the provided docker-compose.yml), you can use paths like data/hcmut_data_faq.csv.
Once the setup is complete, you can run the FastAPI application:
python -m app.mainThe API will be available at http://<APP_HOST>:<APP_PORT> (e.g., http://localhost:8000 by default).
You can access the OpenAPI documentation at http://localhost:8000/docs.
The application provides a CLI command to reindex data into Qdrant. This is useful when you have new or updated FAQ or web data files.
To run the reindexing process:
python -m app.main --reindexThe script will prompt you to enter the paths for your FAQ data file and web data file.
Example:
python -m app.main --reindex
# Output:
# Starting reindexing process...
# Enter the path to the FAQ file (e.g., data/faq.csv): data/hcmut_data_faq.csv
# Enter the path to the Web data file (e.g., data/web.json): data/hcmut_data_web.json
# ... (processing logs) ...
# Database reindexing completed successfully.
You can also use the --dev flag to reindex a smaller subset of the data, which is useful for development and testing:
python -m app.main --reindex --devEnsure your data files are in the correct format (see Data Formats).
The reindexing process expects specific formats for FAQ and web data.
-
CSV Format: Must contain
queryandanswercolumns.query,answer "What are the admission requirements?","The admission requirements include a high school diploma and standardized test scores." "What is the tuition fee?","The tuition fee varies by program. Please check the university website."
-
JSON Format: An array of objects, each with
queryandanswerkeys.[ { "query": "What are the admission requirements?", "answer": "The admission requirements include a high school diploma and standardized test scores." }, { "query": "What is the tuition fee?", "answer": "The tuition fee varies by program. Please check the university website." } ]
This format is typically used for scraped website content.
-
JSON Format: An array of objects, each with
text(main content) andtables(array of strings, each string being a table serialized as text, e.g., Markdown or HTML).[ { "text": "This is the main content of page 1. It discusses various academic programs.", "tables": [ "| Program | Duration | Credits |\n|---|---|---|\n| CS | 4 years | 120 |", "| EE | 4 years | 124 |" ] }, { "text": "Page 2 talks about campus life and facilities.", "tables": [] } ] -
CSV Format: Must contain
textandtablescolumns. Thetablescolumn should be a string representation of a list of table strings (e.g., a JSON-encoded string of a list).text,tables "This is the main content of page 1. It discusses various academic programs.","[\"| Program | Duration | Credits |\\n|---|---|---|\\n| CS | 4 years | 120 |\", \"| EE | 4 years | 124 |\"]" "Page 2 talks about campus life and facilities.","[]"
Note: Storing complex structures like lists of tables in CSV can be cumbersome. JSON is generally preferred for web data.
The application exposes the following main API endpoints (details can be found in the OpenAPI docs at /docs when the app is running):
/query/(POST): Submit a query to get relevant information from the indexed documents./upload-file/(POST): Upload a file for processing (specific processing logic depends on implementation inapp/routers/upload.py).
Refer to app/routers/query.py and app/routers/upload.py for more details on request/response models.