AI Web Scraper

Welcome to the AI Web Scraper project! This project is designed to scrape web content, clean it, and parse it using AI techniques. The project leverages Streamlit for the user interface, Selenium for web scraping, and various Python libraries for content processing.

Introduction

This project was created by Ansh Jain. It provides a simple and efficient way to scrape web content and process it using AI techniques. The project was inspired by a tutorial from Tech With Tim, which made the implementation process easy to understand.

Features

Web scraping using Selenium
Content extraction and cleaning
Parsing content with AI
User-friendly interface with Streamlit

Installation

Follow these steps to set up the project on your local machine:

Clone the Repository:

git clone https://github.com/jansh7784/AI-Web-Scrapper.git
cd AI-Web-Scraper

Create a Virtual Environment:
```
python -m venv venv
```
Activate the Virtual Environment:
- On Windows:
```
.\venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```
Install Dependencies:
```
pip install -r requirements.txt
```

Usage

Follow these steps to run the project:

Run the Streamlit Application:
```
streamlit run main.py
```
Open the Application in Your Browser:
- The application will automatically open in your default web browser. If it doesn't, navigate to http://localhost:8501 in your browser.
Enter the Website URL:
- Enter the URL of the website you want to scrape in the input field and click the "Scrape Website" button.
View the Results:
- The scraped content, cleaned content, and parsed content will be displayed on the web interface.

Using ChromeDriver Locally

For local development and testing, you can use ChromeDriver instead of Bright Data. ChromeDriver is a standalone server that implements the WebDriver protocol for Chrome. Here are the steps to set it up:

Download ChromeDriver:
- Download the ChromeDriver executable from the official site and place it in a directory of your choice.

Update scrape.py:

Modify the scrape_website function to use ChromeDriver locally:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options

def scrape_website(url):
    chrome_service = ChromeService(executable_path='path/to/chromedriver')
    chrome_options = Options()
    driver = webdriver.Chrome(service=chrome_service, options=chrome_options)
    driver.get(url)
    html = driver.page_source
    driver.quit()
    return html

Replace 'path/to/chromedriver' with the actual path to your ChromeDriver executable.

Credits

This project was created by Ansh Jain. Special thanks to Tech With Tim for the tutorial that made this implementation easy to understand.

Connect with me on LinkedIn.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
AI-Web-Scrapper-main.zip		AI-Web-Scrapper-main.zip
LICENSE		LICENSE
README.md		README.md
chromedriver.exe		chromedriver.exe
main.py		main.py
page.png		page.png
parse.py		parse.py
requirements.txt		requirements.txt
sample.env		sample.env
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Web Scraper

Table of Contents

Introduction

Features

Installation

Usage

Using ChromeDriver Locally

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Web Scraper

Table of Contents

Introduction

Features

Installation

Usage

Using ChromeDriver Locally

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages