Indeed Job Scraper

An automated job data extraction system that scrapes job listings from Indeed, processes the data, and exports it to Excel format for analysis.

Features

Dynamic Job Querying: Specify custom job titles and result limits
Real-time Data Fetching: Utilizes Apify's Indeed Job Scraper API
Intelligent Monitoring: Tracks scraping progress with partial results
Data Cleaning: Converts HTML job descriptions to clean, readable text
Excel Export: Generates structured spreadsheets with comprehensive job data
High Performance: Processes up to 1,000 job listings in 1-2 minutes

Prerequisites

Before running this project, ensure you have:

Python 3.x installed
An Apify account with API token
Required Python packages (see installation section)

Installation

Clone the repository:

git clone https://github.com/yourusername/indeed-job-scraper.git
cd indeed-job-scraper

Install required packages:

pip install requests pandas beautifulsoup4

Set up your Apify API token:
- Sign up at Apify
- Get your API token from the dashboard
- Replace [SECURE_TOKEN] in the code with your actual token

💻 Usage

Run the script:

python job_scraper.py

Enter the required information when prompted:
- Job title (e.g., "Software Engineer", "Data Scientist")
- Maximum number of results to fetch (up to 1,000)
Wait for the scraping process to complete
Find your results in the generated Excel file: {job_title}_cleaned_jobs.xlsx

📊 Output Format

The generated Excel file contains the following columns:

Column	Description
Job Title	Position name
Company	Employer name
Location	Job location
Salary	Salary information (if available)
Job Type	Employment type (full-time, part-time, etc.)
Rating	Company rating
Reviews	Number of company reviews
Posted	Job posting date
Apply Link	Direct application link
Description	Clean job description text

Architecture

The system follows a linear processing pipeline:

User Input → API Trigger → Status Monitoring → Data Retrieval → Data Processing → Excel Export

Technology Stack

Python 3.x - Core programming language
Requests - HTTP API communication
Apify API - Web scraping service
BeautifulSoup4 - HTML parsing and text extraction
Pandas - Data manipulation and Excel export

Performance Metrics

Processing Capacity: Up to 1,000 job listings per run
Execution Time: 1-2 minutes for typical queries
Success Rate: High reliability with intelligent error handling
Time Savings: 90% reduction compared to manual job searching

Current Limitations

Single Platform: Limited to Indeed.com only
API Dependency: Requires Apify service availability
Rate Limits: Subject to Apify's usage policies
Local Storage: Files saved locally (no cloud integration)

Future Enhancements

Multi-platform support (LinkedIn, Monster, etc.)
Real-time dashboard integration
Machine learning for job recommendations
Cloud deployment and storage
Advanced filtering and search options
Automated scheduling and notifications

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the project
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

⚠️ Disclaimer

This tool is for educational and research purposes. Please ensure compliance with Indeed's Terms of Service and robots.txt when using this scraper. Be respectful of the website's resources and implement appropriate rate limiting.

👨‍💻 Author

Murali Krishna M.

Project Type: Individual Project
Focus: Web Scraping & Data Automation

🙏 Acknowledgments

Apify for providing the web scraping infrastructure
Indeed for being the data source
Python community for excellent libraries and documentation

📞 Support

If you encounter any issues or have questions, please open an issue on GitHub or contact murali.krishna1591@gmail.com

⭐ Star this repository if you found it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
report		report
README.md		README.md
indeed_scrapper.py		indeed_scrapper.py
java developer_cleaned_jobs.xlsx		java developer_cleaned_jobs.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Indeed Job Scraper

Features

Prerequisites

Installation

💻 Usage

📊 Output Format

Architecture

Technology Stack

Performance Metrics

Current Limitations

Future Enhancements

🤝 Contributing

⚠️ Disclaimer

👨‍💻 Author

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Languages

Murali-KrishnaM/JobSearch_WebScrapping

Folders and files

Latest commit

History

Repository files navigation

Indeed Job Scraper

Features

Prerequisites

Installation

💻 Usage

📊 Output Format

Architecture

Technology Stack

Performance Metrics

Current Limitations

Future Enhancements

🤝 Contributing

⚠️ Disclaimer

👨‍💻 Author

🙏 Acknowledgments

📞 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages