A Retrieval-Augmented Generation (RAG) system for aircraft engine maintenance manuals.
Built on FAA Aviation Maintenance Technician Handbooks, it enables accurate, context-grounded question answering and evaluation.
flowchart LR
A["FAA PDFs in data/pdfs/"] --> B["Chunking\nbuild_corpus.py"]
B --> C["Embeddings + FAISS Index\nembed_index.py"]
D["User Query (q)"] --> E["Retrieval of Relevant Chunks"]
C --> E
E --> F["Answer Generation\nrag_pipeline.py"]
You can try the pipeline in one click:
- Run on Google Colab (recommended)
- Or open the notebook locally:
notebooks/demo_rag.ipynb
A lightweight Gradio/Streamlit app version is planned — see placeholder badge above if you want to host on Hugging Face Spaces later.
- Ingests FAA handbooks in PDF form and chunks text
- Embeds with
sentence-transformers/all-MiniLM-L6-v2 - Vector retrieval with FAISS
- Context-grounded answering via LLMs (
Gemma-2,LLaMA-3.1,Flan-T5) - Evaluation with EM, ROUGE, and RAGAS metrics
- Ready-to-run Colab notebook and Windows batch scripts
Question: What does the accessory gearbox drive?
Retrieved Context: “The accessory gearbox provides drive for the starter, fuel pump, oil pump, hydraulic pump, and generators.”
Answer (RAG): starter, fuel pump, oil pump, hydraulic pump, generator
We evaluated the pipeline on a gold Q&A set (data/qna_eval.jsonl) using RAGAS, Exact Match (EM), and ROUGE metrics.
| Metric | Score | Notes |
|---|---|---|
| Exact Match (EM) | 1.00 | All 8 predictions exactly matched references |
| ROUGE-L | 1.00 | Perfect overlap with reference answers |
| Faithfulness (RAGAS) | – | Not computed yet |
| Relevance (RAGAS) | – | Not computed yet |
📌 Based on 8 Q&A pairs. RAGAS scores can be added once evaluated with src/eval_ragas.py.
This project uses the official FAA Aviation Maintenance Technician Handbooks – Aircraft:
- Ingest PDF manuals/logs → text chunks
- Embed chunks with
sentence-transformers - Build FAISS vector index
- Retrieve + answer via LLM
- Evaluate with RAGAS, EM, ROUGE
- Embeddings:
sentence-transformers/all-MiniLM-L6-v2 - Index: FAISS
- LLMs:
google/gemma-2-2b-it,meta-llama/Llama-3.1-8B-Instruct,google/flan-t5-base - Evaluation: RAGAS, EM, ROUGE
- Utilities: PyPDF, Pandas, NumPy
aviation-engine-maintenance-rag/
├── data/
│ ├── pdfs/ # FAA handbook PDFs
│ └── qna_eval.jsonl # gold Q&A set
├── notebooks/
│ └── demo_rag.ipynb # Jupyter demo
├── scripts/
│ └── fetch_pdfs.py # downloads FAA PDFs
├── src/
│ ├── build_corpus.py # PDF → text chunks
│ ├── embed_index.py # embeddings + FAISS index
│ ├── rag_pipeline.py # retrieval + generation
│ ├── eval_ragas.py # evaluation metrics
│ └── utils.py # helper functions
├── setup_and_fetch.bat # one-click setup (Windows)
├── run_end_to_end.bat # one-click demo run
├── requirements.txt
├── README.md
└── LICENSE
The PDFs placed in data/pdfs/ are the heart of this project – they form the source knowledge base.
All retrieval and answering is grounded in these FAA Aviation Maintenance Technician Handbooks.
When you run the pipeline, it processes the PDFs in three steps:
- Chunking –
src/build_corpus.pysplits the PDFs into manageable text blocks. - Embedding & Indexing –
src/embed_index.pyconverts chunks into dense embeddings and builds a FAISS index. - Retrieval + Answer Generation –
src/rag_pipeline.pyretrieves the most relevant chunks for a query and generates an answer.
This ensures that every answer is derived from the FAA handbooks rather than model hallucination.
The FAA PDFs are included in this repo under data/pdfs/ so the demo works out of the box.
(You can also refresh them anytime using scripts/fetch_pdfs.py or setup_and_fetch.bat.)
# Clone repo
git clone https://github.com/suniltyagi/aviation-engine-maintenance-rag.git
cd aviation-engine-maintenance-rag
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# (Optional) Refresh FAA PDFs
setup_and_fetch.bat
# Run the full pipeline (index → RAG → evaluation)
run_end_to_end.bat
# Clone repo
git clone https://github.com/suniltyagi/aviation-engine-maintenance-rag.git
cd aviation-engine-maintenance-rag
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# (Optional) Refresh FAA PDFs
python scripts/fetch_pdfs.py
# Run pipeline manually (index → RAG → evaluation)
python src/build_corpus.py
python src/embed_index.py
python src/rag_pipeline.py
python src/eval_ragas.py