LinkedIn Resume Tailoring Pipeline

A deterministic, end-to-end pipeline that automatically tailors LaTeX resumes to LinkedIn job postings while maintaining transparency and resume integrity.

Why This Project Exists

Job applications often require tailoring resumes to specific roles, but manual customization is time-consuming and inconsistent. This pipeline automates the process while ensuring:

Transparency: Every change is explained and reviewable
Integrity: No fabricated skills or inflated experience
Consistency: Systematic keyword optimization across applications
Control: Human oversight at every step

Features

🔍 Automated LinkedIn Job Scraping - Extract job descriptions from LinkedIn URLs
📊 TF-IDF Keyword Extraction - Identify high-signal keywords without LLM bias
✏️ AI-Powered Resume Tailoring - Intelligently modify resume content using OpenAI
📝 Change Tracking - Detailed explanations for every modification
🔒 Safety First - Strict rules prevent skill fabrication or experience inflation
📄 LaTeX Output - Professional, ATS-friendly PDF generation

Quick Start

Setup

git clone <this-repository>
cd linkedin-resume-pipeline
pip install -r requirements.txt

Configure

cp .env.example .env
# Add your OpenAI API key to .env

Add Your Resume

# Place your LaTeX resume as resume/resume.tex

Add Jobs

# Edit data/jobs.csv
job_id,company,role,job_url
google-swe-001,Google,Software Engineer,https://linkedin.com/jobs/view/123456

Run Pipeline
```
python run_pipeline.py
```
Review Results
- Check outputs/ for tailored resumes
- Review changes_explained.txt for all modifications

Installation

Prerequisites

Python 3.8+
Chrome browser
OpenAI API key
LaTeX resume file

Dependencies

pip install -r requirements.txt

Environment Variables

Create a .env file:

OPENAI_API_KEY=your_openai_api_key_here
CHROME_PROFILE_PATH=/Users/yourname/chrome-selenium-profile  # Optional

🔐 LinkedIn Scraping & Chrome Profile (Important)

LinkedIn aggressively blocks automated scraping. To work reliably and ethically, this project uses Selenium with a persistent Chrome user profile.

Why a Chrome Profile Is Required

Allows manual login to LinkedIn once
Reuses your real browser session (cookies, auth, JS execution)
Avoids brittle username/password automation
Dramatically reduces bot detection

How It Works

Selenium launches Chrome using a custom user-data directory
You log in to LinkedIn normally
The session is reused for all future runs

One-Time Setup

Create a Chrome profile directory
```
mkdir -p ~/chrome-selenium-profile
```

Set environment variable

CHROME_PROFILE_PATH=/Users/yourname/chrome-selenium-profile

Run scraper for first time
```
python scripts/scrape_jobs.py
```
Log into LinkedIn when Chrome opens
- This is required only once
- Do NOT close Chrome while scraping is running

After this, future runs will reuse the authenticated session automatically.

Supported Platforms

✅ macOS (fully tested)
⚠️ Windows (works, path format differs)
⚠️ Linux (works, Chrome must be installed)

Security Notes

Credentials are never stored in this repo
Login happens in a real Chrome browser
No password automation is used
You can delete the profile directory at any time to reset

Usage

1. Prepare Your Resume

Place your LaTeX resume file at resume/resume.tex. The pipeline will only modify content inside \resumeItem{...} commands.

2. Job Input (`data/jobs.csv`)

Add job postings to scrape and tailor for:

job_id,company,role,job_url
4165741696,Rogers Communications,Solution Architect - Managed Services,https://www.linkedin.com/jobs/view/4165741696/
4343808522,DataStealth.io,Cloud Architect,https://www.linkedin.com/jobs/view/4343808522/

Fields:

job_id: Unique identifier for this application
company: Company name
role: Job title
job_url: LinkedIn job posting URL

3. Run Individual Steps

Scrape Jobs Only:

python scripts/scrape_jobs.py

Extract Keywords Only:

python scripts/extract_keywords.py

Rewrite Resume Only:

python scripts/rewrite_resume.py

Run Complete Pipeline:

python run_pipeline.py

4. Review Output

Each job creates a folder: outputs/<Company> - <Role> - <JobID>/

Contents:

job.json - Scraped job data
keywords.json - Extracted keywords
resume_tailored.tex - Tailored LaTeX resume
changes_explained.txt - Detailed change log

Pipeline Flow

Step 1: Job Scraping

Uses Selenium with persistent Chrome profile
One-time LinkedIn login required
Extracts full job description text
Handles dynamic content loading
Saves raw HTML on failures for debugging

Step 2: Keyword Extraction

Applies TF-IDF analysis to job descriptions
Identifies multi-word technical terms
Filters out generic business language
No LLM bias in keyword selection

Step 3: Resume Tailoring

Uses OpenAI API for intelligent rewriting
Strict Safety Rules:
- Only modifies \resumeItem{...} content
- Preserves all metrics and achievements
- No skill fabrication
- No experience inflation
- No structural changes
Generates detailed change explanations

Step 4: Quality Assurance

Every change is logged and explained
Original and modified versions side-by-side
Reasoning provided for each modification
Human review strongly recommended

Project Structure

├── data/
│   └── jobs.csv              # Job input file
├── resume/
│   └── resume.tex            # Your LaTeX resume (you provide this)
├── scripts/
│   ├── scrape_jobs.py        # LinkedIn scraping
│   ├── extract_keywords.py   # TF-IDF keyword extraction
│   └── rewrite_resume.py     # AI resume tailoring
├── outputs/                  # Generated results
├── run_pipeline.py           # Main orchestrator
├── requirements.txt          # Python dependencies
└── .env.example             # Environment template

Why These Design Choices?

LaTeX for Resumes

Professional Output: Superior typography and formatting
ATS Compatibility: Consistent, parseable structure
Version Control: Text-based format for easy diffing
Customization: Programmatic modifications possible

TF-IDF Before LLMs

Objectivity: Mathematical keyword extraction without AI bias
Transparency: Explainable algorithm vs. black-box selection
Cost Efficiency: No API calls for keyword identification
Reproducibility: Deterministic results

Detailed Change Tracking

Accountability: Every modification is justified
Learning: Understand what makes resumes effective
Safety: Catch inappropriate changes before submission
Compliance: Maintain truthfulness in applications

Example Output

Input Job Description

"We're looking for a Senior Software Engineer with expertise in 
Python, React, and AWS to build scalable web applications..."

Extracted Keywords

{
  "technical_keywords": ["Python", "React", "AWS", "scalable web applications"],
  "skill_keywords": ["software engineering", "full-stack development"],
  "domain_keywords": ["cloud infrastructure", "microservices"]
}

Resume Changes

ORIGINAL: Built a web application using modern frameworks
UPDATED:  Built a scalable web application using React and Python
REASON:   Added "scalable" and specific technologies (React, Python) 
          mentioned in job requirements

Safety & Ethics

What This Tool Does

Optimizes existing experience for relevance
Adds appropriate technical keywords
Improves action verb strength
Maintains factual accuracy

What This Tool Never Does

Fabricates skills or experience
Inflates job titles or responsibilities
Creates false achievements
Modifies dates or company names

Ethical Guidelines

All changes must be defensible in interviews
No misrepresentation of qualifications
Transparency in automated modifications
Human review is mandatory

Limitations

LinkedIn Dependency: Requires active LinkedIn access
LaTeX Requirement: Resume must be in LaTeX format
Manual Review Needed: Automated changes require human verification
API Costs: OpenAI usage incurs charges
Rate Limits: LinkedIn may throttle scraping requests
Private Use: Currently designed for individual use

Getting Started Tips

Test with one job first to understand the output format
Review all changes carefully before using tailored resumes
Keep your base resume updated as the source of truth
Use meaningful job_ids for easy organization
Check LinkedIn rate limits if scraping many jobs

Note: This is a private tool for personal resume optimization. All resume content should remain truthful and defensible in interviews.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
resume		resume
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

saagarracharla/LinkedinPipeLine

Folders and files

Latest commit

History

Repository files navigation