Scenario 2: Career Gap Intelligence Platform
Ashlesha T | ashleshat5@gmail.com
[ Demo Video: https://youtu.be/qfjP9bxnLAo ]
SkillBridge AI ingests resumes and job descriptions, performs semantic skill-gap analysis via a dual-stream neural network, and delivers personalized 4-week learning roadmaps — all on an async, Kafka-driven pipeline.
- Dual-Stream MLP — tabular features (KNN distances, role, seniority) + raw 384-dim skill embedding fused for priority scoring
- Async Pipeline — FastAPI → Kafka → Workers → ChromaDB → WebSocket push
- Idempotency + DLQ — Redis-backed deduplication; failed events persisted to PostgreSQL dead-letter queue with exponential backoff
- Groq LLM Roadmaps — Llama 3.1 with 8 s timeout and template fallback
- Observability — Prometheus metrics exposed at
/metrics; Grafana dashboards
| Layer | Technology |
|---|---|
| API | FastAPI (async) |
| Message Queue | Kafka 7.6 (KRaft — no Zookeeper) |
| Cache / Idempotency | Redis 7 |
| Database | PostgreSQL 15 + SQLAlchemy async |
| Vector Store | ChromaDB |
| Embeddings | all-MiniLM-L6-v2 (384-dim, pre-trained) |
| Gap Priority Model | PyTorch DualStreamMLP + sklearn KNN |
| LLM | Groq — Llama 3.1 8b |
| Observability | Prometheus + Grafana |
Tabular features (18-dim): 8 skill-category one-hot + 4 role one-hot + 3 seniority one-hot + gap_score + mean_knn_dist + min_knn_dist
Embedding (384-dim): raw all-MiniLM-L6-v2 output of the JD skill name
Training: FocalLoss(γ=2.0), Adam(lr=1e-3, wd=1e-4), CosineAnnealingWarmRestarts(T₀=30), best-checkpoint saving, seed=42
Metrics (val set, 205 samples):
| Metric | Value |
|---|---|
| F1 | 0.85 |
| Accuracy | 0.91 |
| R² | 0.72 |
| MAE | 0.09 |
- Docker & Docker Compose
- Groq API key — get one free at console.groq.com
Create .env in the project root:
GROQ_API_KEY=gsk_your_key_here
POSTGRES_DB=skillbridge
POSTGRES_USER=skill
POSTGRES_PASSWORD=password
DATABASE_URL=postgresql+asyncpg://skill:password@postgres:5432/skillbridge
REDIS_URL=redis://redis:6379
KAFKA_BOOTSTRAP_SERVERS=kafka:9092
CHROMA_PERSIST_DIR=/app/chromadocker compose up -d --buildThis starts 9 services: PostgreSQL, Redis, Kafka (KRaft), API, 4 workers, Prometheus, Grafana.
Wait ~30 s for Kafka to be healthy, then check:
docker compose ps # all services should show "healthy" or "running"
curl localhost:8000/healthNote Frontend:
cd frontend
npm install
npm run devPOST /api/resume/text Submit resume as JSON
POST /api/resume/upload Upload PDF resume
POST /api/jd/text Submit JD as JSON
POST /api/jd/url Submit JD URL (scraper)
GET /api/roadmap/{rid}/{jid} Generate 4-week roadmap
POST /api/chat/{session_id} Career chat
WS /ws/{session_id} Real-time analysis push
GET /metrics Prometheus metrics
GET /health Health check
Install test dependencies (one-time, uses project venv):
.venv/bin/pip install pytest pytest-asyncioRun all tests:
.venv/bin/python -m pytest backend/tests/ -vExpected output: 16 passed
backend/tests/test_edge_cases.py::test_scraper_malformed_url_returns_none PASSED
backend/tests/test_edge_cases.py::test_scraper_connection_error_returns_none PASSED
backend/tests/test_edge_cases.py::test_scraper_http_error_returns_none PASSED
backend/tests/test_edge_cases.py::test_groq_timeout_returns_fallback_roadmap PASSED
backend/tests/test_edge_cases.py::test_groq_api_error_returns_fallback PASSED
backend/tests/test_edge_cases.py::test_chat_response_groq_timeout_returns_string PASSED
backend/tests/test_edge_cases.py::test_fallback_engine_no_templates_returns_generic PASSED
backend/tests/test_edge_cases.py::test_fallback_roadmap_has_4_weeks PASSED
backend/tests/test_gap_analysis.py::test_result_has_all_required_fields PASSED
backend/tests/test_gap_analysis.py::test_skill_gaps_sorted_by_priority_descending PASSED
backend/tests/test_gap_analysis.py::test_top_gap_has_highest_priority PASSED
backend/tests/test_gap_analysis.py::test_overall_match_bounded PASSED
backend/tests/test_gap_analysis.py::test_resume_not_found_raises_value_error PASSED
backend/tests/test_gap_analysis.py::test_jd_not_found_raises_value_error PASSED
backend/tests/test_gap_analysis.py::test_empty_jd_skills_returns_zero_match PASSED
backend/tests/test_gap_analysis.py::test_empty_resume_all_skills_flagged_missing PASSED
test_gap_analysis.py — Happy Path (4)
| Test | What it verifies |
|---|---|
test_result_has_all_required_fields |
Response contains resume_id, jd_id, overall_match, skill_gaps, missing_skills |
test_skill_gaps_sorted_by_priority_descending |
skill_gaps list is ordered highest → lowest priority |
test_top_gap_has_highest_priority |
skill_gaps[0] is always the most critical skill to learn |
test_overall_match_bounded |
overall_match is always in [0.0, 1.0] |
test_gap_analysis.py — Negative (4)
| Test | What it verifies |
|---|---|
test_resume_not_found_raises_value_error |
Missing resume ID → ValueError (not 500) |
test_jd_not_found_raises_value_error |
Missing JD ID → ValueError (not 500) |
test_empty_jd_skills_returns_zero_match |
JD with no skills → overall_match=0.0, skill_gaps=[] |
test_empty_resume_all_skills_flagged_missing |
Resume with no skills → all JD skills in missing_skills |
test_edge_cases.py — Scraper (3)
| Test | What it verifies |
|---|---|
test_scraper_malformed_url_returns_none |
Unreachable URL → returns None, no crash |
test_scraper_connection_error_returns_none |
Network-level failure → graceful None |
test_scraper_http_error_returns_none |
HTTP 404 → graceful None (route falls back to paste mode) |
test_edge_cases.py — LLM / Reliability (5)
| Test | What it verifies |
|---|---|
test_groq_timeout_returns_fallback_roadmap |
Groq timeout → mode="fallback" roadmap returned |
test_groq_api_error_returns_fallback |
Any Groq exception → fallback + LLM_FALLBACK_TOTAL incremented |
test_chat_response_groq_timeout_returns_string |
Chat timeout → non-empty string, no exception propagated |
test_fallback_engine_no_templates_returns_generic |
Missing templates file → generic 4-week roadmap |
test_fallback_roadmap_has_4_weeks |
Every fallback roadmap has exactly week1–week4 |
All tests use mocks — no running Kafka, Redis, PostgreSQL, or Groq API required.
The priority model is trained via a Jupyter notebook (no Docker needed):
cd backend/ml
jupyter lab train_gap_model.ipynb # or open in VS Code
# Run all cells — takes ~2 min on CPU
# Saves: ml_artifacts/gap_model.pt + ml_artifacts/knn_model.pklRaw metrics at http://localhost:8000/metrics and http://localhost:9090.
| Metric | Type | Description |
|---|---|---|
events_processed_total |
Counter | Kafka events processed per topic |
analysis_completed_total |
Counter | Gap analyses completed |
gap_analysis_latency_seconds |
Histogram | End-to-end analysis latency |
llm_fallback_total |
Counter | Groq failures → fallback activations |
dlq_depth |
Gauge | Current dead-letter queue backlog |
Prometheus datasource is auto-provisioned — no manual config needed.
- Open
http://localhost:3001— loginadmin / admin - Dashboards → New → Add visualization and use these queries:
| Panel | PromQL |
|---|---|
| Events/sec | rate(events_processed_total[1m]) |
| Analysis rate | rate(analysis_completed_total[1m]) |
| Latency p95 | histogram_quantile(0.95, gap_analysis_latency_seconds_bucket) |
| LLM fallbacks | increase(llm_fallback_total[5m]) |
| DLQ depth | dlq_depth |
While a simple background task would work for a single-user demo, Kafka was chosen to satisfy the event-driven requirements of the case study. It provides:
- Resilience: The DLQ (Dead Letter Queue) ensures no analysis is lost if the LLM or ML model crashes.
- Async UX: The UI remains responsive while the "heavy" embedding and ML steps happen in the background.
- Extensibility: New features (like a "Notification Service" or "Audit Log") can be added as new Kafka consumers without changing the core API logic.
No. The AI Resume Enhancer uses a "Semantic Reframing" prompt. It analyzes your existing bullet points and rewrites them to better align with the JD's terminology and priorities without inventing fake experience.
This project was built with AI assistance:
- Gemini CLI — synthetic dataset generation (
data/mlp_training_data.json,data/sample_jobs.json, etc.) - Claude (Anthropic) — architecture design, ML model implementation, test suite, code review
One suggestion I evaluated and rejected: using soft-KNN weighted averaging as the embedding signal instead of the raw 384-dim embedding stream. Soft-KNN would require storing all training embeddings at inference time (memory overhead) and is strictly less expressive than giving the model direct access to the full semantic space.
● What did you cut to stay within the 4–6 hour limit?
- User Authentication: Login and signup flows were deferred to focus on core AI logic.
- Load/Performance Testing: Load handling hasn't been verified for high-concurrency scenarios.
- Complex URL Scraping: Advanced support for JS-rendered job portals was simplified.
- Training MLP on more data and probably using more real Data
- Here original Data from O*NET and Kaggle were used to Seed gemini cli to produce synthetic data
● What would you build next if you had more time?
- Executable Prep Workspace: An integrated environment to track and complete the generated plans.
- Gamified Preparation: Implementation of XP, badges, and skill levels.
- Robust JD Extraction: Better support for complex URLs (LinkedIn, Greenhouse, etc.).
- Professional PDF Output: High-quality exports for reports and roadmaps.
- snyk test and sonarqube test for code coverage anmd vulnarability test

