An intelligent tool to analyze, score, and rank resumes against job descriptions using FastAPI, NLP, and PGVector.
- Resume Parsing: Extract text from PDF and DOCX files.
- Bias-Aware Privacy: Redact PII (names, emails, phones) before analysis to ensure fairness.
- Skills Matching: Automatically extract skills and identify gaps.
- Smart Scoring: Uses Sentence-Transformers for semantic similarity between resumes and job descriptions.
- Portfolio Scaffold: Ready-to-use landing page for showcase.
- Backend: FastAPI, SQLAlchemy, PostgreSQL (PGVector), SpaCy, Sentence-Transformers.
- Frontend: React, Vite, CSS Modules.
- DevOps: Docker, Docker Compose, GitHub Actions.
- Docker and Docker Compose
- Node.js (optional, for local frontend development)
- Python 3.11+ (optional, for local backend development)
-
Clone the repo:
git clone https://github.com/Umoru98/ai-career-intelligence-engine cd ai-career-intelligence-engine -
Setup Environment:
cp .env.example .env
-
Start Services:
docker-compose up -d
-
Run Migrations:
docker-compose exec api alembic upgrade head
The API will be available at http://localhost:8000/docs and the frontend at http://localhost:5173.
We provide a Makefile for convenience:
make up: Start services.make logs: View logs.make test: Run backend tests.make migrate: Run database migrations.make lint: Run linters (Ruff/MyPy).
See SECURITY.md for reporting vulnerabilities.
This project is licensed under the MIT License - see the LICENSE file for details.
A production-ready, API-first resume analysis platform powered by Sentence Transformers, spaCy, and FastAPI. Upload resumes, paste a job description, and get instant match scores, skill gap analysis, and actionable improvement suggestions.
┌─────────────────────────────────────────────────────────┐
│ docker-compose │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Frontend │ │ Backend │ │ PostgreSQL │ │
│ │ React+Vite │→ │ FastAPI │→ │ + pgvector │ │
│ │ nginx:80 │ │ port 8000 │ │ port 5432 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
Upload (PDF/DOCX)
→ Text Extraction (pdfplumber / python-docx)
→ Text Cleaning (whitespace, bullets, page numbers)
→ PII Redaction (spaCy NER + regex: names, emails, phones, addresses)
→ Section Detection (regex heading rules)
→ Skills Extraction (dictionary/PhraseMatcher against skills.yml)
→ Embedding Generation (Sentence Transformers: all-MiniLM-L6-v2)
→ Cosine Similarity vs JD Embedding
→ Score Normalization: (cos_sim + 1) / 2 × 100
→ Skill Gap Analysis (intersection / difference)
→ Template-based Explanation + Suggestions
→ Store in PostgreSQL
- Docker Desktop
- Python 3.11+ (for local dev without Docker)
- Node 20+ (for frontend dev)
# 1. Clone and configure
cp .env.example .env
# 2. Start all services
docker-compose up -d
# 3. Run database migrations
docker-compose exec api alembic upgrade head
# 4. (Optional) Pre-download ML models
make download-models
# 5. Open the app
# Frontend: http://localhost:5173
# API docs: http://localhost:8000/docscd backend
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
# Set env vars (copy .env.example to .env and adjust DATABASE_URL)
uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run dev
# Opens at http://localhost:5173curl -X POST http://localhost:8000/v1/resumes/upload \
-F "file=@resume.pdf"Response:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"original_filename": "resume.pdf",
"extraction_status": "success",
"sha256": "abc123...",
"created_at": "2026-02-18T17:00:00Z"
}curl -X POST http://localhost:8000/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"title": "Senior Python Engineer",
"description": "We are looking for a Python engineer with FastAPI, PostgreSQL, Docker, and AWS experience..."
}'curl -X POST http://localhost:8000/v1/analyze \
-H "Content-Type: application/json" \
-d '{
"resume_id": "<resume-uuid>",
"job_id": "<job-uuid>"
}'# Upload 3 resumes first, then:
curl -X POST http://localhost:8000/v1/jobs/<job-uuid>/rank \
-H "Content-Type: application/json" \
-d '{
"resume_ids": ["<uuid1>", "<uuid2>", "<uuid3>"]
}'# Step 1: Upload 3 resumes
R1=$(curl -s -X POST http://localhost:8000/v1/resumes/upload -F "file=@alice.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
R2=$(curl -s -X POST http://localhost:8000/v1/resumes/upload -F "file=@bob.docx" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
R3=$(curl -s -X POST http://localhost:8000/v1/resumes/upload -F "file=@carol.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
# Step 2: Create JD
JOB=$(curl -s -X POST http://localhost:8000/v1/jobs \
-H "Content-Type: application/json" \
-d '{"title":"Python Dev","description":"Python, FastAPI, Docker, PostgreSQL, AWS, CI/CD experience required."}' \
| python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
# Step 3: Rank
curl -X POST http://localhost:8000/v1/jobs/$JOB/rank \
-H "Content-Type: application/json" \
-d "{\"resume_ids\": [\"$R1\", \"$R2\", \"$R3\"]}"
# Step 4: View details
curl http://localhost:8000/v1/resumes/$R1| Score | Meaning |
|---|---|
| 75–100% | Strong match |
| 50–74% | Moderate match |
| 0–49% | Weak match |
Score formula: score = clamp((cosine_similarity + 1) / 2 × 100, 0, 100)
This is a linear normalization of cosine similarity from [-1, 1] to [0, 100]. Typical resume-JD similarities range from 0.3–0.9 (50–95%). A calibrated threshold model is a future improvement (see TODO in embedder.py).
| Model | Purpose | Size |
|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 |
Text embeddings | ~80MB |
en_core_web_sm |
NER for PII redaction | ~12MB |
Models are pre-downloaded during Docker build (Dockerfile). For fully offline use:
# Pre-download on a machine with internet, then copy cache
docker-compose exec api python -c "
from sentence_transformers import SentenceTransformer
SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
"The model cache is stored in the model_cache Docker volume.
| Table | Purpose |
|---|---|
resumes |
Uploaded files + extracted/cleaned/redacted text + sections |
jobs |
Job descriptions |
embeddings |
Cached embedding vectors (JSONB; pgvector upgrade path documented) |
analyses |
Match results: score, skills, explanation, suggestions |
Currently embeddings are stored as JSONB arrays. To upgrade to pgvector:
- Ensure
pgvector/pgvector:pg16image is used (already in docker-compose) - The migration runs
CREATE EXTENSION IF NOT EXISTS vector - Add a new
vector(384)column toembeddingstable - Migrate JSONB → vector column
- Create
ivfflatorhnswindex for ANN search
cd backend
pip install aiosqlite # for in-memory SQLite tests
pytest tests/ -v --tb=short
pytest tests/ --cov=app --cov-report=term-missing- File validation: Content-type + size enforced before processing
- Safe filenames: UUID-based, no user-provided paths
- PII redaction: Names, emails, phones, addresses removed before embedding
- No code execution: Uploaded files are never executed
- Non-root Docker: API runs as
appuser(UID 1000) - CORS: Configurable via
CORS_ORIGINSenv var - No auth (MVP): Structure supports adding OAuth2/JWT middleware to FastAPI
- Secrets: Never logged; use
.envfile (not committed)
- ✅ Bias-aware scoring (PII redaction before embeddings)
- ✅ Section detection (education, skills, experience, projects, certifications)
- ✅ Resume improvement suggestions (grounded, template-based)
- ✅ Multiple resume comparison (
/v1/compare) - ✅ API-first design with versioned endpoints
- Calibrate score thresholds with labeled data
- OCR support for scanned PDFs (Tesseract integration, opt-in)
- Celery/RQ for background embedding jobs
- pgvector ANN indexing for large-scale ranking
- Authentication (OAuth2 + JWT)
- Resume version history
- Export results as PDF/CSV
- LLM-based suggestions (constrained, evidence-grounded)