A lightweight Retrieval-Augmented Generation system using Markdown files as knowledge base.
Welcome to the FC Barcelona HR RAG System - a simple, self-contained Retrieval-Augmented Generation (RAG) project that loads Markdown-based HR/player records directly from the filesystem into memory.
No database. No vector DB. No embeddings. Just pure local markdown → dictionary → context injection → LLM.
Perfect for small projects, demo apps, or experimenting with RAG fundamentals.
- HR/player records stored as
.mdfiles - Loaded automatically into a Python
dict - Easy to modify, easy to version-control
- User query analyzed for keyword matches
- Relevant markdown snippets injected into system prompt
- LLM answers with higher accuracy using only local data
- Uses
gpt-4.1-nanofor fast + cheap responses - System prompt built specifically around FC Barcelona employees
- Full chat interface
- Local browser launch
- Debug logs enabled
- No vectors
- No external DB
- No third-party storage
- Everything lives inside
data/employees/*.md
/
├── data/
│ └── employees/
│ ├── <employee1>.md
│ ├── <employee2>.md
│ └── ...
│
├── src/
│ ├── main.py
│
├── tests/
│ ├── test_context.py
│
├── synthetic_data_generator.py
├── requirements.txt
└── README.md
knowledge = load_markdown_files()Each employee record becomes a key-value entry:
{name: markdown_content}
get_relevant_context(message)If words from the question match employee names → relevant markdown is returned.
system_message = SYSTEM_PREFIX + additional_context(message)openai.chat.completions.create(...)gr.ChatInterface(...).launch(inbrowser=True)pip install -r requirements.txtCreate a .env file:
OPENAI_API_KEY=your_key_here
data/employees/*.md
python main.pyThis opens the Gradio chat UI in your browser.
"What is the salary of Pedri?" "How did João Félix perform in 2022?" "Tell me about the injuries of Ansu Fati."
The system will automatically: ✔ extract keywords ✔ search local markdown files ✔ inject only relevant context ✔ respond like an FC Barcelona HR expert
- Python 3.10+
- OpenAI API
- Gradio
- dotenv
- Local markdown RAG (no vector DB)