A comprehensive Python/Flask application for searching remote job positions across 9 Applicant Tracking System (ATS) platforms using the Google Serper API. Search by technology and seniority level to find relevant positions across multiple job boards.
- 11 Technology Presets: PHP, JavaScript, Python, Java, C#, Ruby, Go, Rust, DevOps, Data, Mobile
- 7 Seniority Levels: Any, Trainee, Junior, Mid-Level, Senior, Staff, Manager
- 9 ATS Platform Support: Greenhouse, Lever, Ashby, Workable, BreezyHR, JazzHR, SmartRecruiters, ICIMS, PinpointHQ
- Dual Interface: Command-line tool and web dashboard with real-time progress tracking
- Dynamic Query Builder: Intelligent search queries with platform-specific optimizations
- CSV Export: Results automatically saved to date-stamped CSV files
- REST API: Full-featured API for programmatic access
- Server-Sent Events (SSE): Real-time scraping progress updates via streaming
- Comprehensive Testing: 143+ unit tests with pytest
.
├── scraper.py # Core scraping logic with query builder
├── server.py # Flask backend with REST API
├── main.py # CLI entry point with argparse
├── templates/
│ └── index.html # Frontend dashboard (single-page HTML)
├── test_scraper.py # Unit tests for scraper module
├── test_server.py # Unit tests for Flask API
├── test_queries.py # Unit tests for query building
├── pyproject.toml # Project metadata and dependencies
├── .env.example # Environment variable template
└── README.md # This file
- Python 3.12 or higher
- Poetry (for dependency management)
- Serper API key (get free at https://serper.dev)
- Clone or download the project:
git clone <repository-url>
cd job-scraper- Install dependencies using Poetry:
poetry install- Create a
.envfile based on the template:
cp .env.example .env- Add your Serper API key to the
.envfile:
SERPAPI_API_KEY_1=your_serper_api_key_hereSearch for jobs using the command line:
# Search for Python jobs at any level
poetry run python main.py --technology python --level any
# Search for senior-level JavaScript positions
poetry run python main.py -t javascript -l senior
# Search for junior DevOps engineers
poetry run python main.py --technology devops --level junior
# View available options
poetry run python main.py --helpphp, javascript, python, java, csharp, ruby, go, rust, devops, data, mobile
any, trainee, junior, mid, senior, staff, manager
Start the Flask development server:
poetry run python server.pyThe dashboard will be available at http://127.0.0.1:8000
The web interface provides:
- Interactive job search and filtering
- Real-time scraping progress with live updates
- Job count by ATS platform
- Technology and seniority level selection
- CSV export functionality
GET /
Returns the frontend dashboard HTML page.
GET /api/jobs?origin=Greenhouse&search=Python
Retrieve jobs from the latest CSV file with optional filters.
Query Parameters:
origin(optional): Filter by ATS platform name (e.g., "Greenhouse", "Lever")search(optional): Filter by job title substring (case-insensitive)
Response:
{
"jobs": [
{
"title": "Senior Python Developer",
"snippet": "...",
"link": "https://...",
"origin": "Greenhouse"
}
],
"total": 150,
"file": "jobs_python_senior_2026-02-07.csv"
}GET /api/origins
Get job counts grouped by ATS platform.
Response:
{
"origins": [
{"name": "Greenhouse", "count": 45},
{"name": "Lever", "count": 38},
{"name": "Ashby", "count": 32}
]
}GET /api/technologies
Get all available technology presets with their keywords.
Response:
{
"technologies": [
{
"key": "python",
"label": "Python",
"keywords": ["\"Python\"", "\"Django\"", "\"FastAPI\"", "\"Flask\""]
}
]
}GET /api/levels
Get all available seniority levels with their keywords.
Response:
{
"levels": [
{
"key": "senior",
"label": "Senior",
"keywords": ["\"Senior\"", "\"Sr\"", "\"Lead\""]
}
]
}POST /api/fetch-jobs
Content-Type: application/json
{
"technology": "python",
"level": "senior"
}
Begin a full scrape across all 9 ATS platforms for the specified technology and level.
Response:
{
"status": "started",
"stream_url": "/api/fetch-jobs/stream"
}GET /api/fetch-jobs/status
Get current scrape status without events.
Response:
{
"running": true,
"completed": false,
"events_count": 5
}GET /api/fetch-jobs/stream
Server-Sent Events (SSE) stream for real-time scraping progress. Connect this endpoint after starting a scrape with POST /api/fetch-jobs.
Event Types:
ats_start: ATS platform scrape startedpage_fetched: Page of results retrievedats_complete: ATS platform scrape completedcomplete: Full scrape completed with resultserror: Error occurred during scrape
Example Event:
{
"event": "ats_complete",
"origin": "Greenhouse",
"count": 45,
"index": 0,
"total": 9
}POST /api/fetch-jobs/reset
Reset the scraping state to allow a new scrape to begin.
Response:
{
"status": "reset"
}POST /api/scrape
Content-Type: application/json
{
"url": "https://job-boards.greenhouse.io/company/jobs/123"
}
Scrape and extract content from a specific job URL using the Serper Scrape API.
Response:
{
"markdown": "# Job Title\n...",
"text": "Job Title\n..."
}CSV files are automatically saved with the following naming pattern:
jobs_{technology}_{level}_{date}.csv
Example:
jobs_python_senior_2026-02-07.csvjobs_javascript_junior_2026-02-08.csv
- Job Title: Position title from job listing
- Snippet: Summary/description from search results
- Link: Direct URL to the job posting
Run the complete test suite:
poetry run pytestRun tests with coverage:
poetry run pytest --cov=. --cov-report=htmlRun specific test file:
poetry run pytest test_scraper.py -v
poetry run pytest test_server.py -vThe project includes 143+ tests covering:
- Query builder functionality
- ATS platform-specific rules
- Job link validation
- CSV export
- Flask API endpoints
- SSE streaming
- Filter operations
- Cache behavior
The core scraping engine with the following key functions:
build_ats_queries(technology, level)
Generates search queries for each ATS platform with intelligent filtering:
- PinpointHQ: Excludes level and location filters
- ICIMS: Excludes location filter
- Others: Full filtering with level and location
query_jobs(query, origin, api_key, target_min=50, on_progress=None)
Executes searches across multiple pages and collects unique, validated job listings.
run_full_scrape(api_key, technology, level, on_progress=None)
Orchestrates the complete scrape across all 9 ATS platforms with progress callbacks.
save_results_to_csv(results, output_dir, technology, level)
Exports results to a date-stamped CSV file.
Flask application providing:
- REST API endpoints for job data and filtering
- Real-time SSE streaming for scrape progress
- CSV caching with automatic invalidation
- Background thread management for long-running scrapes
- Origin detection from job URLs
Each technology preset maps to related keywords for comprehensive searches:
| Technology | Keywords |
|---|---|
| PHP | PHP, Laravel |
| JavaScript | JavaScript, TypeScript, React, Node.js |
| Python | Python, Django, FastAPI, Flask |
| Java | Java, Spring, Spring Boot |
| C# | C#, .NET, ASP.NET |
| Ruby | Ruby, Ruby on Rails |
| Go | Golang, Go Developer |
| Rust | Rust, Rust Developer |
| DevOps | DevOps, SRE, Platform Engineer, Kubernetes |
| Data | Data Engineer, Data Scientist, Machine Learning, MLOps |
| Mobile | iOS, Android, React Native, Flutter |
Create a .env file in the project root:
SERPAPI_API_KEY_1=your_serper_api_key_here- Sign up at https://serper.dev
- Free tier provides 2500 requests per month
- This project uses approximately 80 requests per run
- Supports up to 26 runs per month on the free tier
Run weekly job searches:
- Open Task Scheduler
- Create Basic Task with:
- Action: Start a program
- Program:
C:\path\to\python.exe - Arguments:
C:\path\main.py --technology php --level senior - Trigger: Every Monday at 9:00 AM
Add to crontab for Monday 9:00 AM UTC:
0 9 * * 1 /path/to/python /path/to/main.py --technology php --level senior- Searches limited to jobs posted in the past 7 days (Google Serper
qdr:wfilter) - Maximum 5 pages per ATS platform per search
- Results filtered to exclude non-ATS job aggregators (LinkedIn, Indeed, Glassdoor, etc.)
- Platform-specific query adaptations based on ATS capabilities
- 1-2 second delays between requests to respect API rate limits
- Single technology/level scrape: ~90 seconds (9 ATS platforms)
- Average results: 300-500 jobs per scrape
- API calls per scrape: ~80 requests
- CSV file size: 50-150 KB
Core dependencies:
- requests: HTTP client for API calls
- flask: Web framework for dashboard and API
- python-decouple: Environment variable management
- weasyprint: PDF generation (optional)
See pyproject.toml for complete dependency list.
This project is provided as-is for educational and development purposes.
- Original Project: Marcelo Andriolli - Gist: Job Scraper
- Refactoring and Extension: Davi Oliveira
The original based scraper has been refactored and significantly extended with:
- Python/Flask backend with REST API
- Multi-technology and multi-seniority support
- Real-time web dashboard with SSE streaming
- Comprehensive test coverage
- Production-ready architecture
For issues, feature requests, or contributions, please open an issue or submit a pull request.
For questions about the Serper API, visit their documentation.