SRM Career Catalyst is an advanced, data-driven backend system designed to analyze student resumes against job descriptions. Unlike standard keyword matchers, this system leverages a Knowledge Genome built from the resumes of successfully placed alumni to provide context-aware, actionable feedback.
The platform integrates Traditional NLP, Unsupervised Machine Learning, and Generative AI to uncover real-world placement patterns and guide students toward improved placement outcomes.
"Compare a student not to a job description alone, but to the profiles of alumni who actually got placed."
graph TD
A[Raw Alumni Resume PDFs] --> B["PDF & Text Extraction (PyPDF + Regex)"]
B --> C["NLP Entity Recognition (spaCy - Skills, Orgs, Dates)"]
B --> D["Achievement Analysis (GenAI - Metrics Extraction)"]
B --> E["Text Vectorization (SentenceTransformers)"]
E --> F["Career Archetype Discovery (K-Means Clustering)"]
C --> G[("Metadata Store")]
D --> G
F --> H[("FAISS Vector Store + Metadata")]
I[Student Resume PDF] --> J[Resume Extraction Node]
K[Job Description Text] --> L[RAG Retrieval Node]
L -->|Semantic Query| H
H -->|Relevant Archetypes & Alumni Examples| M[GenAI Reasoning Node]
J -->|Structured Resume Data| M
M --> N["Actionable Feedback Report (Markdown Output)"]
style A fill:#e1f5ff
style I fill:#e1f5ff
style K fill:#e1f5ff
style N fill:#d4edda
style G fill:#fff3cd
style H fill:#fff3cd
This phase builds the intelligence layer of the system.
- Converts unstructured alumni resume PDFs into structured text.
- Tools:
PyPDF,Regex
- Extracts skills, organizations, roles, and timelines.
- Model:
spaCy (en_core_web_lg)
- Uses GenAI to identify quantifiable achievements.
- Example:
"Reduced latency by 20ms"βSpeed / Performance
- Model:
gpt-4o-mini(via GitHub AI + Azure SDK)
- Converts resumes into dense embeddings.
- Uses K-Means clustering to discover natural career archetypes.
- No manual labels required initially.
- Embeddings and metadata indexed using FAISS.
- Enables high-speed semantic retrieval during inference.
This phase exposes the system as a REST API.
-
Request Handling
- Resume PDF + Job Description received via FastAPI.
-
Orchestration
- Workflow managed using LangGraph:
- Extraction β Retrieval β Reasoning
- Workflow managed using LangGraph:
-
RAG (Retrieval-Augmented Generation)
- Retrieves the most relevant alumni archetypes from FAISS.
-
Synthesis
- GenAI compares student profile against:
- Successful alumni patterns
- Required job skills
- Quantified achievements
- GenAI compares student profile against:
-
Output
- Markdown-formatted feedback
- Skill gaps
- Missing metrics
- Resume improvement suggestions
- Language: Python 3.9+
- API: FastAPI, Uvicorn
- Workflow: LangGraph, LangChain
- Generative AI: GitHub AI (openai/gpt-4o-mini)
- NLP: spaCy (NER)
- ML: Scikit-Learn (K-Means, TF-IDF)
- Embeddings: Sentence-Transformers (
all-MiniLM-L6-v2)
- Vector DB: FAISS (CPU)
- Data: Pandas, NumPy
- PDF Parsing: PyPDF, Regex
Placement-Project/
βββ processed_data/
β βββ structured_resumes.csv
β βββ resumes_with_metrics.csv
β βββ clustered_resumes.csv
β βββ embeddings.npy
β βββ archetype_insights.json
βββ vector_store/
β βββ srm_resumes.index
β βββ srm_resumes.pkl
βββ data/
β βββ raw_resumes/
βββ extract_resume_data.py
βββ analyze_achievements.py
βββ cluster_resumes.py
βββ label_archetypes.py
βββ build_vector_store.py
βββ services.py
βββ graph.py
βββ models.py
βββ main.py
βββ requirements.txt
βββ .env
- Python 3.9+
- GitHub Account (for Personal Access Token)
python -m venv .venv
# Windows
.\.venv\Scripts\Activate.ps1
# Mac/Linux
source .venv/bin/activatepip install -r requirements.txtpython -m spacy download en_core_web_lgCreate a .env file in the project root:
GITHUB_TOKEN="your_github_pat_token_here"python extract_resume_data.pypython analyze_achievements.pyRun 1: Generate elbow plot
python cluster_resumes.py- Inspect
elbow_plot.png - Update
OPTIMAL_Kin the script
Run 2: Final clustering
python cluster_resumes.pyRun 1: Inspect keywords
python label_archetypes.py- Update
ARCHETYPE_LABELS
Run 2: Save labeled insights
python label_archetypes.pypython build_vector_store.pypython -m uvicorn main:app --reloadServer runs at:
http://127.0.0.1:8000
POST /analyze
- job_description (Query Parameter): Full job description text
- resume_file (Body): Student resume PDF
- Streaming Markdown text
- Alumni archetype comparison
- Skill gap analysis
- Actionable resume recommendations
SRM Career Catalyst moves beyond resume screening and into career intelligence, enabling students to align their profiles with real-world placement success patterns.
For academic and research use. Extendable for institutional deployment.