Skip to content

ashlesh-t/skillbridge-ai

Repository files navigation

SkillBridge AI — Event-Driven Career Gap Intelligence Platform

Scenario 2: Career Gap Intelligence Platform

Author

[ Demo Video: https://youtu.be/qfjP9bxnLAo ]


SkillBridge AI ingests resumes and job descriptions, performs semantic skill-gap analysis via a dual-stream neural network, and delivers personalized 4-week learning roadmaps — all on an async, Kafka-driven pipeline.


Key Features

  • Dual-Stream MLP — tabular features (KNN distances, role, seniority) + raw 384-dim skill embedding fused for priority scoring
  • Async Pipeline — FastAPI → Kafka → Workers → ChromaDB → WebSocket push
  • Idempotency + DLQ — Redis-backed deduplication; failed events persisted to PostgreSQL dead-letter queue with exponential backoff
  • Groq LLM Roadmaps — Llama 3.1 with 8 s timeout and template fallback
  • Observability — Prometheus metrics exposed at /metrics; Grafana dashboards

Architecture

alt text

Stack

Layer Technology
API FastAPI (async)
Message Queue Kafka 7.6 (KRaft — no Zookeeper)
Cache / Idempotency Redis 7
Database PostgreSQL 15 + SQLAlchemy async
Vector Store ChromaDB
Embeddings all-MiniLM-L6-v2 (384-dim, pre-trained)
Gap Priority Model PyTorch DualStreamMLP + sklearn KNN
LLM Groq — Llama 3.1 8b
Observability Prometheus + Grafana

ML Model — DualStreamMLP

alt text

Tabular features (18-dim): 8 skill-category one-hot + 4 role one-hot + 3 seniority one-hot + gap_score + mean_knn_dist + min_knn_dist

Embedding (384-dim): raw all-MiniLM-L6-v2 output of the JD skill name

Training: FocalLoss(γ=2.0), Adam(lr=1e-3, wd=1e-4), CosineAnnealingWarmRestarts(T₀=30), best-checkpoint saving, seed=42

Metrics (val set, 205 samples):

Metric Value
F1 0.85
Accuracy 0.91
0.72
MAE 0.09

Quick Start

1. Prerequisites

2. Environment

Create .env in the project root:

GROQ_API_KEY=gsk_your_key_here

POSTGRES_DB=skillbridge
POSTGRES_USER=skill
POSTGRES_PASSWORD=password
DATABASE_URL=postgresql+asyncpg://skill:password@postgres:5432/skillbridge

REDIS_URL=redis://redis:6379
KAFKA_BOOTSTRAP_SERVERS=kafka:9092
CHROMA_PERSIST_DIR=/app/chroma

3. Start the full stack

docker compose up -d --build

This starts 9 services: PostgreSQL, Redis, Kafka (KRaft), API, 4 workers, Prometheus, Grafana.

Wait ~30 s for Kafka to be healthy, then check:

docker compose ps          # all services should show "healthy" or "running"
curl localhost:8000/health

Note Frontend:

cd frontend
npm install
npm run dev

4. API Endpoints

POST  /api/resume/text          Submit resume as JSON
POST  /api/resume/upload        Upload PDF resume
POST  /api/jd/text              Submit JD as JSON
POST  /api/jd/url               Submit JD URL (scraper)
GET   /api/roadmap/{rid}/{jid}  Generate 4-week roadmap
POST  /api/chat/{session_id}    Career chat
WS    /ws/{session_id}          Real-time analysis push
GET   /metrics                  Prometheus metrics
GET   /health                   Health check

Running Tests

Install test dependencies (one-time, uses project venv):

.venv/bin/pip install pytest pytest-asyncio

Run all tests:

.venv/bin/python -m pytest backend/tests/ -v

Expected output: 16 passed

backend/tests/test_edge_cases.py::test_scraper_malformed_url_returns_none        PASSED
backend/tests/test_edge_cases.py::test_scraper_connection_error_returns_none     PASSED
backend/tests/test_edge_cases.py::test_scraper_http_error_returns_none           PASSED
backend/tests/test_edge_cases.py::test_groq_timeout_returns_fallback_roadmap     PASSED
backend/tests/test_edge_cases.py::test_groq_api_error_returns_fallback           PASSED
backend/tests/test_edge_cases.py::test_chat_response_groq_timeout_returns_string PASSED
backend/tests/test_edge_cases.py::test_fallback_engine_no_templates_returns_generic PASSED
backend/tests/test_edge_cases.py::test_fallback_roadmap_has_4_weeks              PASSED
backend/tests/test_gap_analysis.py::test_result_has_all_required_fields          PASSED
backend/tests/test_gap_analysis.py::test_skill_gaps_sorted_by_priority_descending PASSED
backend/tests/test_gap_analysis.py::test_top_gap_has_highest_priority            PASSED
backend/tests/test_gap_analysis.py::test_overall_match_bounded                   PASSED
backend/tests/test_gap_analysis.py::test_resume_not_found_raises_value_error     PASSED
backend/tests/test_gap_analysis.py::test_jd_not_found_raises_value_error         PASSED
backend/tests/test_gap_analysis.py::test_empty_jd_skills_returns_zero_match      PASSED
backend/tests/test_gap_analysis.py::test_empty_resume_all_skills_flagged_missing PASSED

Test Coverage

test_gap_analysis.py — Happy Path (4)

Test What it verifies
test_result_has_all_required_fields Response contains resume_id, jd_id, overall_match, skill_gaps, missing_skills
test_skill_gaps_sorted_by_priority_descending skill_gaps list is ordered highest → lowest priority
test_top_gap_has_highest_priority skill_gaps[0] is always the most critical skill to learn
test_overall_match_bounded overall_match is always in [0.0, 1.0]

test_gap_analysis.py — Negative (4)

Test What it verifies
test_resume_not_found_raises_value_error Missing resume ID → ValueError (not 500)
test_jd_not_found_raises_value_error Missing JD ID → ValueError (not 500)
test_empty_jd_skills_returns_zero_match JD with no skills → overall_match=0.0, skill_gaps=[]
test_empty_resume_all_skills_flagged_missing Resume with no skills → all JD skills in missing_skills

test_edge_cases.py — Scraper (3)

Test What it verifies
test_scraper_malformed_url_returns_none Unreachable URL → returns None, no crash
test_scraper_connection_error_returns_none Network-level failure → graceful None
test_scraper_http_error_returns_none HTTP 404 → graceful None (route falls back to paste mode)

test_edge_cases.py — LLM / Reliability (5)

Test What it verifies
test_groq_timeout_returns_fallback_roadmap Groq timeout → mode="fallback" roadmap returned
test_groq_api_error_returns_fallback Any Groq exception → fallback + LLM_FALLBACK_TOTAL incremented
test_chat_response_groq_timeout_returns_string Chat timeout → non-empty string, no exception propagated
test_fallback_engine_no_templates_returns_generic Missing templates file → generic 4-week roadmap
test_fallback_roadmap_has_4_weeks Every fallback roadmap has exactly week1week4

All tests use mocks — no running Kafka, Redis, PostgreSQL, or Groq API required.


Model Training

The priority model is trained via a Jupyter notebook (no Docker needed):

cd backend/ml
jupyter lab train_gap_model.ipynb   # or open in VS Code
# Run all cells — takes ~2 min on CPU
# Saves: ml_artifacts/gap_model.pt + ml_artifacts/knn_model.pkl

Observability

Prometheus

Raw metrics at http://localhost:8000/metrics and http://localhost:9090.

Metric Type Description
events_processed_total Counter Kafka events processed per topic
analysis_completed_total Counter Gap analyses completed
gap_analysis_latency_seconds Histogram End-to-end analysis latency
llm_fallback_total Counter Groq failures → fallback activations
dlq_depth Gauge Current dead-letter queue backlog

Grafana Setup

Prometheus datasource is auto-provisioned — no manual config needed.

  1. Open http://localhost:3001 — login admin / admin
  2. Dashboards → New → Add visualization and use these queries:
Panel PromQL
Events/sec rate(events_processed_total[1m])
Analysis rate rate(analysis_completed_total[1m])
Latency p95 histogram_quantile(0.95, gap_analysis_latency_seconds_bucket)
LLM fallbacks increase(llm_fallback_total[5m])
DLQ depth dlq_depth

FAQ

1. Why use Kafka for a prototype?

While a simple background task would work for a single-user demo, Kafka was chosen to satisfy the event-driven requirements of the case study. It provides:

  • Resilience: The DLQ (Dead Letter Queue) ensures no analysis is lost if the LLM or ML model crashes.
  • Async UX: The UI remains responsive while the "heavy" embedding and ML steps happen in the background.
  • Extensibility: New features (like a "Notification Service" or "Audit Log") can be added as new Kafka consumers without changing the core API logic.

2. Does the AI just add more skills?

No. The AI Resume Enhancer uses a "Semantic Reframing" prompt. It analyzes your existing bullet points and rewrites them to better align with the JD's terminology and priorities without inventing fake experience.


AI Disclosure

This project was built with AI assistance:

  • Gemini CLI — synthetic dataset generation (data/mlp_training_data.json, data/sample_jobs.json, etc.)
  • Claude (Anthropic) — architecture design, ML model implementation, test suite, code review

One suggestion I evaluated and rejected: using soft-KNN weighted averaging as the embedding signal instead of the raw 384-dim embedding stream. Soft-KNN would require storing all training embeddings at inference time (memory overhead) and is strictly less expressive than giving the model direct access to the full semantic space.

Tradeoffs & Prioritization:

● What did you cut to stay within the 4–6 hour limit?

  • User Authentication: Login and signup flows were deferred to focus on core AI logic.
  • Load/Performance Testing: Load handling hasn't been verified for high-concurrency scenarios.
  • Complex URL Scraping: Advanced support for JS-rendered job portals was simplified.
  • Training MLP on more data and probably using more real Data
  • Here original Data from O*NET and Kaggle were used to Seed gemini cli to produce synthetic data

● What would you build next if you had more time?

  • Executable Prep Workspace: An integrated environment to track and complete the generated plans.
  • Gamified Preparation: Implementation of XP, badges, and skill levels.
  • Robust JD Extraction: Better support for complex URLs (LinkedIn, Greenhouse, etc.).
  • Professional PDF Output: High-quality exports for reports and roadmaps.
  • snyk test and sonarqube test for code coverage anmd vulnarability test

About

SkillBridge is a real-time career gap intelligence platform that analyzes resumes against job descriptions using an event-driven architecture. It combines vector similarity, a PyTorch re-ranker, and RAG-based LLMs to generate learning roadmaps, resume improvements, and an interview readiness score.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors