Skip to content

An end-to-end, transparent pipeline that automatically tailors LaTeX resumes to LinkedIn job postings while preserving resume integrity and full human review.

Notifications You must be signed in to change notification settings

saagarracharla/LinkedinPipeLine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LinkedIn Resume Tailoring Pipeline

A deterministic, end-to-end pipeline that automatically tailors LaTeX resumes to LinkedIn job postings while maintaining transparency and resume integrity.

Why This Project Exists

Job applications often require tailoring resumes to specific roles, but manual customization is time-consuming and inconsistent. This pipeline automates the process while ensuring:

  • Transparency: Every change is explained and reviewable
  • Integrity: No fabricated skills or inflated experience
  • Consistency: Systematic keyword optimization across applications
  • Control: Human oversight at every step

Features

  • πŸ” Automated LinkedIn Job Scraping - Extract job descriptions from LinkedIn URLs
  • πŸ“Š TF-IDF Keyword Extraction - Identify high-signal keywords without LLM bias
  • ✏️ AI-Powered Resume Tailoring - Intelligently modify resume content using OpenAI
  • πŸ“ Change Tracking - Detailed explanations for every modification
  • πŸ”’ Safety First - Strict rules prevent skill fabrication or experience inflation
  • πŸ“„ LaTeX Output - Professional, ATS-friendly PDF generation

Quick Start

  1. Setup

    git clone <this-repository>
    cd linkedin-resume-pipeline
    pip install -r requirements.txt
  2. Configure

    cp .env.example .env
    # Add your OpenAI API key to .env
  3. Add Your Resume

    # Place your LaTeX resume as resume/resume.tex
  4. Add Jobs

    # Edit data/jobs.csv
    job_id,company,role,job_url
    google-swe-001,Google,Software Engineer,https://linkedin.com/jobs/view/123456
    
  5. Run Pipeline

    python run_pipeline.py
  6. Review Results

    • Check outputs/ for tailored resumes
    • Review changes_explained.txt for all modifications

Installation

Prerequisites

  • Python 3.8+
  • Chrome browser
  • OpenAI API key
  • LaTeX resume file

Dependencies

pip install -r requirements.txt

Environment Variables

Create a .env file:

OPENAI_API_KEY=your_openai_api_key_here
CHROME_PROFILE_PATH=/Users/yourname/chrome-selenium-profile  # Optional

πŸ” LinkedIn Scraping & Chrome Profile (Important)

LinkedIn aggressively blocks automated scraping. To work reliably and ethically, this project uses Selenium with a persistent Chrome user profile.

Why a Chrome Profile Is Required

  • Allows manual login to LinkedIn once
  • Reuses your real browser session (cookies, auth, JS execution)
  • Avoids brittle username/password automation
  • Dramatically reduces bot detection

How It Works

  • Selenium launches Chrome using a custom user-data directory
  • You log in to LinkedIn normally
  • The session is reused for all future runs

One-Time Setup

  1. Create a Chrome profile directory

    mkdir -p ~/chrome-selenium-profile
  2. Set environment variable

    CHROME_PROFILE_PATH=/Users/yourname/chrome-selenium-profile
  3. Run scraper for first time

    python scripts/scrape_jobs.py
  4. Log into LinkedIn when Chrome opens

    • This is required only once
    • Do NOT close Chrome while scraping is running

After this, future runs will reuse the authenticated session automatically.

Supported Platforms

  • βœ… macOS (fully tested)
  • ⚠️ Windows (works, path format differs)
  • ⚠️ Linux (works, Chrome must be installed)

Security Notes

  • Credentials are never stored in this repo
  • Login happens in a real Chrome browser
  • No password automation is used
  • You can delete the profile directory at any time to reset

Usage

1. Prepare Your Resume

Place your LaTeX resume file at resume/resume.tex. The pipeline will only modify content inside \resumeItem{...} commands.

2. Job Input (data/jobs.csv)

Add job postings to scrape and tailor for:

job_id,company,role,job_url
4165741696,Rogers Communications,Solution Architect - Managed Services,https://www.linkedin.com/jobs/view/4165741696/
4343808522,DataStealth.io,Cloud Architect,https://www.linkedin.com/jobs/view/4343808522/

Fields:

  • job_id: Unique identifier for this application
  • company: Company name
  • role: Job title
  • job_url: LinkedIn job posting URL

3. Run Individual Steps

Scrape Jobs Only:

python scripts/scrape_jobs.py

Extract Keywords Only:

python scripts/extract_keywords.py

Rewrite Resume Only:

python scripts/rewrite_resume.py

Run Complete Pipeline:

python run_pipeline.py

4. Review Output

Each job creates a folder: outputs/<Company> - <Role> - <JobID>/

Contents:

  • job.json - Scraped job data
  • keywords.json - Extracted keywords
  • resume_tailored.tex - Tailored LaTeX resume
  • changes_explained.txt - Detailed change log

Pipeline Flow

Step 1: Job Scraping

  • Uses Selenium with persistent Chrome profile
  • One-time LinkedIn login required
  • Extracts full job description text
  • Handles dynamic content loading
  • Saves raw HTML on failures for debugging

Step 2: Keyword Extraction

  • Applies TF-IDF analysis to job descriptions
  • Identifies multi-word technical terms
  • Filters out generic business language
  • No LLM bias in keyword selection

Step 3: Resume Tailoring

  • Uses OpenAI API for intelligent rewriting
  • Strict Safety Rules:
    • Only modifies \resumeItem{...} content
    • Preserves all metrics and achievements
    • No skill fabrication
    • No experience inflation
    • No structural changes
  • Generates detailed change explanations

Step 4: Quality Assurance

  • Every change is logged and explained
  • Original and modified versions side-by-side
  • Reasoning provided for each modification
  • Human review strongly recommended

Project Structure

β”œβ”€β”€ data/
β”‚   └── jobs.csv              # Job input file
β”œβ”€β”€ resume/
β”‚   └── resume.tex            # Your LaTeX resume (you provide this)
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ scrape_jobs.py        # LinkedIn scraping
β”‚   β”œβ”€β”€ extract_keywords.py   # TF-IDF keyword extraction
β”‚   └── rewrite_resume.py     # AI resume tailoring
β”œβ”€β”€ outputs/                  # Generated results
β”œβ”€β”€ run_pipeline.py           # Main orchestrator
β”œβ”€β”€ requirements.txt          # Python dependencies
└── .env.example             # Environment template

Why These Design Choices?

LaTeX for Resumes

  • Professional Output: Superior typography and formatting
  • ATS Compatibility: Consistent, parseable structure
  • Version Control: Text-based format for easy diffing
  • Customization: Programmatic modifications possible

TF-IDF Before LLMs

  • Objectivity: Mathematical keyword extraction without AI bias
  • Transparency: Explainable algorithm vs. black-box selection
  • Cost Efficiency: No API calls for keyword identification
  • Reproducibility: Deterministic results

Detailed Change Tracking

  • Accountability: Every modification is justified
  • Learning: Understand what makes resumes effective
  • Safety: Catch inappropriate changes before submission
  • Compliance: Maintain truthfulness in applications

Example Output

Input Job Description

"We're looking for a Senior Software Engineer with expertise in 
Python, React, and AWS to build scalable web applications..."

Extracted Keywords

{
  "technical_keywords": ["Python", "React", "AWS", "scalable web applications"],
  "skill_keywords": ["software engineering", "full-stack development"],
  "domain_keywords": ["cloud infrastructure", "microservices"]
}

Resume Changes

ORIGINAL: Built a web application using modern frameworks
UPDATED:  Built a scalable web application using React and Python
REASON:   Added "scalable" and specific technologies (React, Python) 
          mentioned in job requirements

Safety & Ethics

What This Tool Does

  • Optimizes existing experience for relevance
  • Adds appropriate technical keywords
  • Improves action verb strength
  • Maintains factual accuracy

What This Tool Never Does

  • Fabricates skills or experience
  • Inflates job titles or responsibilities
  • Creates false achievements
  • Modifies dates or company names

Ethical Guidelines

  • All changes must be defensible in interviews
  • No misrepresentation of qualifications
  • Transparency in automated modifications
  • Human review is mandatory

Limitations

  • LinkedIn Dependency: Requires active LinkedIn access
  • LaTeX Requirement: Resume must be in LaTeX format
  • Manual Review Needed: Automated changes require human verification
  • API Costs: OpenAI usage incurs charges
  • Rate Limits: LinkedIn may throttle scraping requests
  • Private Use: Currently designed for individual use

Getting Started Tips

  1. Test with one job first to understand the output format
  2. Review all changes carefully before using tailored resumes
  3. Keep your base resume updated as the source of truth
  4. Use meaningful job_ids for easy organization
  5. Check LinkedIn rate limits if scraping many jobs

Note: This is a private tool for personal resume optimization. All resume content should remain truthful and defensible in interviews.

About

An end-to-end, transparent pipeline that automatically tailors LaTeX resumes to LinkedIn job postings while preserving resume integrity and full human review.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published