Multilabel Skill Classifier & Job-Resume Skill Matcher

Multilabel skill classifier from tech job descriptions + Resume-Job matching system

This project features a multilabel text classification model to extract hard and soft skills from tech job descriptions. The model was trained using data scraped from Indeed. The data was collected in two steps:

Job URL collection: Due to limitations I could only scrape 26k job links scraped from the first page of various job locations. The URL scraper can be found in scraper\link_scraper.py .
Job Description Scraping: Using the job URLs, the job descriptions were scraped with scraper\job_description_scraper.py. All the scraped data can be found in the data folder.

Key Features

Detects 83 different skills (hard + soft skills)
- Hard skills: Python, SQL, JavaScript, React, Node.js, AWS, Docker, Git, Java, C++, etc.
- Soft skills: Teamwork, Collaboration, Communication, Critical Thinking, Problem Solving, Leadership, Adaptability, etc.
State-of-the-art transformer-based multilabel classification
Best performing model: ModernBERT — near-perfect results
Fast inference using ONNX runtime
Live interactive demo on Hugging Face Spaces
Full-featured Flask web application including:
- Job Description → Skills Extraction
- Resume ↔ Job Description → Skill Matching (shows matching & missing skills + confidence scores)

Automated Labeling Pipeline

It seemed like a big task to manually label all 22k job descriptions. Hence, I used a rule-based multi-labeling system using regex.

Here's how the labeling works:

Comprehensive Skills Dictionary
A total of 83 target skills (hard + soft) were defined covering the most frequently mentioned competencies in tech job postings.
Regex-Based Pattern Matching
Each skill is associated with a carefully crafted regular expression that captures:
- Common abbreviations used to define the skills
- Different spellings & variations
- Full names and short forms
- Case-insensitive matching
One-Hot Encoding
For every job description, it was checked if the text contains any of the defined patterns for each skill.

Model Training and ONNX Inference

I initally started out with distilroberta-base. Then I explored 3 more transformer-based models and found modernbert to be the best among them with an accuracy score of 0.99. Finally, I converted the trained model into ONNX.

Model Performance Comparison

Model	Accuracy	F1-Samples	F1-Macro	F1-Micro
distilroberta-base	0.9700	0.9446	0.9243	0.9498
modernbert (best)	0.9996	0.9928	0.9969	0.9990
all-MiniLM-L6-v2	0.9700	0.9028	0.8782	0.9081
bert-base-uncased	0.9800	0.9304	0.9110	0.9348

Model Deployment

The model was depployed in HugginFace Spaces Gradio App. You'll get the implementation in the deployment folder or in the gradio app

Fig: HuggingFace Spaces Gradio App Demo

Web Deployment

The Flask-based web app lets user extract skills from job descriptions and also match their resume with the required skills. Try the Skill Extractor and Resume-Matcher on Render.

Fig: Skill Analyzer and Resume-Matcher App Homepage

Fig: Skill Analyzer Demo

Run Locally

# Clone the repo
git clone https://github.com/Naawshin/Multilabel-Skill-Classifier.git

# Switch to flask
git switch flask

# Create virtual environment
python -m venv venv
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start the app
python app.py

Feedback, feature requests, bug reports, and pull requests are very welcome!

Feel free to reach out:

✉️ Email: nawshintabassum88@gmail.com
🔗 LinkedIn: https://www.linkedin.com/in/nowshin-tabasum/

If this project helped you or you just found it interesting, please consider giving it a star ⭐ It really helps the project grow!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
deployment		deployment
models		models
notebook		notebook
scraper		scraper
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilabel Skill Classifier & Job-Resume Skill Matcher

Key Features

Automated Labeling Pipeline

Here's how the labeling works:

Model Training and ONNX Inference

Model Performance Comparison

Model Deployment

Web Deployment

Run Locally

About

Uh oh!

Releases

Packages

Languages

License

Naawshin/Multilabel-Skill-Classifier

Folders and files

Latest commit

History

Repository files navigation

Multilabel Skill Classifier & Job-Resume Skill Matcher

Key Features

Automated Labeling Pipeline

Here's how the labeling works:

Model Training and ONNX Inference

Model Performance Comparison

Model Deployment

Web Deployment

Run Locally

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages