A FastAPI RAG pipeline that ingests PDF files, makes embeddings and stores them in a Vector Store (Postgres + pgvector database), then when a query is made performs a similarity search and builds the context to pass to the LLM to generate accurate bibliography-backed answers to the questions using langchain.
- POST /data: Ingest PDF documents into the system.
- POST /query: Send a query to the system and receive a response based on the ingested documents.
Build and run the application using Docker Compose:
docker compose up --build # or build and then run separately
# Remove containers and volumes later
docker compose down -vcurl -X POST localhost:8000/data -F "file=@./data/Distribuidos_Clase_07_MOM_Distribuido_ZeroMQ.pdf"
# Response
{
"chunks_ingested":11
}curl -X POST -H "Content-Type: application/json" -d '{ "query": "para que sirven los grupos de comunicacion?" }' localhost:8000/query
# Response
{
"query": "para que sirven los grupos de comunicacion?",
"answer": "Para ese query, te puedo decir que según el contexto proporcionado, los grupos de comunicación (o patrones de mensajería) en ZeroMQ sirven para:\n\n* Comunicar tareas entre productores y consumidores (patrón Producer-Consumer o Push-Pull)\n* Implementar patrones de mensajería como:\n * Request-Reply\n * Publisher-Subscriber\n * Parallel Pipeline\n * Patrones avanzados\n\nEstos patrones permiten la comunicación entre múltiples productores y consumidores, garantizando la entrega justa de mensajes (fairness) y utilizando sockets específicos para cada rol (PUSH/PULL, ROUTER/DEALER, etc.). \n\nZeroMQ es una herramienta útil para crear brokerless middlewares y es altamente performante."
}+--------+ +----------+ +-----------+ +--------+
| PDFs | --> | Data | --> | Embedding | --> | Vector |
| | | Chunking | | Model | | DB |
+--------+ +----------+ +-----------+ +--------+
| |
| x chunk_size |
+----------------+
+---------+ +-----------+ +--------+
| Query | --> | Embedding | --> | Vector |
| | | Model | | DB |
+---------+ +-----------+ +--------+
^
|
similarity search
+-----------+ +-----------+ +-------+
| Retrieval | --> | Context | --> | LLM | (llama-4-scout)
| Step | | Documents | +-------+
+-----------+ +-----------+ ^
|
+-------+
| Query |
+-------+
This project follows a layered architecture to separate responsibilities clearly:
Handles the HTTP layer (built with FastAPI).
- controllers/: Define the
/dataand/queryendpoints. - router.py: Registers routes.
- dependencies.py: Dependency injection configuration.
Defines request and response DTOs (validation using Pydantic).
Contains the business logic:
- PDF ingestion and text chunking
- Embedding generation
- Vector search
- LLM interaction
- Query orchestration (RAG flow)
Implements persistence and similarity search using PostgreSQL + pgvector.
Domain models representing core entities (e.g., documents).
