Skip to content

Naawshin/Multilabel-Skill-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multilabel Skill Classifier & Job-Resume Skill Matcher

Python Transformers ONNX Flask Hugging Face Spaces

Multilabel skill classifier from tech job descriptions + Resume-Job matching system

This project features a multilabel text classification model to extract hard and soft skills from tech job descriptions. The model was trained using data scraped from Indeed. The data was collected in two steps:

  1. Job URL collection: Due to limitations I could only scrape 26k job links scraped from the first page of various job locations. The URL scraper can be found in scraper\link_scraper.py .
  2. Job Description Scraping: Using the job URLs, the job descriptions were scraped with scraper\job_description_scraper.py. All the scraped data can be found in the data folder.

Key Features

  • Detects 83 different skills (hard + soft skills)
    • Hard skills: Python, SQL, JavaScript, React, Node.js, AWS, Docker, Git, Java, C++, etc.
    • Soft skills: Teamwork, Collaboration, Communication, Critical Thinking, Problem Solving, Leadership, Adaptability, etc.
  • State-of-the-art transformer-based multilabel classification
  • Best performing model: ModernBERT — near-perfect results
  • Fast inference using ONNX runtime
  • Live interactive demo on Hugging Face Spaces
  • Full-featured Flask web application including:
    • Job Description → Skills Extraction
    • Resume ↔ Job Description → Skill Matching (shows matching & missing skills + confidence scores)

Automated Labeling Pipeline

It seemed like a big task to manually label all 22k job descriptions. Hence, I used a rule-based multi-labeling system using regex.

Here's how the labeling works:

  1. Comprehensive Skills Dictionary
    A total of 83 target skills (hard + soft) were defined covering the most frequently mentioned competencies in tech job postings.

  2. Regex-Based Pattern Matching
    Each skill is associated with a carefully crafted regular expression that captures:

    • Common abbreviations used to define the skills
    • Different spellings & variations
    • Full names and short forms
    • Case-insensitive matching
  3. One-Hot Encoding
    For every job description, it was checked if the text contains any of the defined patterns for each skill.

Model Training and ONNX Inference

I initally started out with distilroberta-base. Then I explored 3 more transformer-based models and found modernbert to be the best among them with an accuracy score of 0.99. Finally, I converted the trained model into ONNX.

Model Performance Comparison

Model Accuracy F1-Samples F1-Macro F1-Micro
distilroberta-base 0.9700 0.9446 0.9243 0.9498
modernbert (best) 0.9996 0.9928 0.9969 0.9990
all-MiniLM-L6-v2 0.9700 0.9028 0.8782 0.9081
bert-base-uncased 0.9800 0.9304 0.9110 0.9348

Model Deployment

The model was depployed in HugginFace Spaces Gradio App. You'll get the implementation in the deployment folder or in the gradio app

Gradio App Demo
Fig: HuggingFace Spaces Gradio App Demo

Web Deployment

The Flask-based web app lets user extract skills from job descriptions and also match their resume with the required skills. Try the Skill Extractor and Resume-Matcher on Render.

Resume-Matcher App Demo
Fig: Skill Analyzer and Resume-Matcher App Homepage
Resume-Matcher App Demo
Fig: Skill Analyzer Demo

Run Locally

# Clone the repo
git clone https://github.com/Naawshin/Multilabel-Skill-Classifier.git

# Switch to flask
git switch flask

# Create virtual environment
python -m venv venv
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start the app
python app.py

Feedback, feature requests, bug reports, and pull requests are very welcome!

Feel free to reach out:


If this project helped you or you just found it interesting, please consider giving it a star ⭐ It really helps the project grow!

About

Multilabel transformer-based tech job skill classifier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published