Skip to content

Steosumit/daiict-scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Table Data Scraper

A web scraping tool for extracting and analyzing placement data from the college placement portal.

Overview

This project automates the extraction of company information from placement.******.ac.in, transforming it into a structured format suitable for data analysis. It handles authentication, data extraction, and standardization of inconsistent information.

Requirements

  • Python 3.7+
  • Edge WebDriver
  • Python packages: requests>=2.28.0 beautifulsoup4>=4.11.0 selenium>=4.0.0 pandas>=1.4.0

Setup

  1. Install required packages
  2. Configure credentials in extractor.ipynb (username/password):

export USER="whatever"

export PASS="whatever"

Usage

The extraction process involves two steps:

  1. Run the link extraction script:

This generates companies.csv with all company URLs

  1. Run extractor.ipynb notebook to(manual intervention):
    • Process the extracted links
    • Scrape detailed company information
    • Standardize data structure
    • Create a pandas DataFrame
    • Export data to companies_processed.csv

Known Issues

Companies #136 and #137 have inconsistent data structures that require manual handling.

Contact

For new issues: sgsonu132@gmail.com (Sumit)

Made for personal purpose. The code can be used for many purposes with tweaks

About

Simple data scrapper that collects links and preprocesses it to pandas friendly dataframes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors