Web-Scout

About

WebScout is a powerful web scraping and data visualization tool built using Scrapy and Django. It allows users to extract data from website source and Images from Getty Imges portal, store it in a MongoDB database, and generate detailed reports with visualizations.

Web scraping is a common technique used for various purposes, such as market research, competitive analysis, or data collection for research projects. WebScout simplifies the process by providing an easy-to-use web interface for scraping data from websites. Users can simply input the URL of the target website, and WebScout will handle the rest.

Key features of WebScout include:

Web Scraping: WebScout utilizes Scrapy, a popular Python web crawling framework, to scrape data from websites efficiently.
Data Storage: Scraped data is stored in a MongoDB database, making it easy to retrieve and analyze.
Data Visualization: WebScout offers various data visualization options using Plotly, allowing users to gain insights from the collected data.
Report Generation: Users can generate comprehensive reports that include charts, graphs, and word clouds based on the scraped data.

Screenshots

Homepage:

Message after scrapping:

List of mongoDB Collections created after each scrapping:

Detailed Report:

Technologies Used

WebScout is built using the following technologies and libraries:

Django: A Python web framework for building web applications.
Scrapy: A Python web crawling and scraping framework.
MongoDB: A NoSQL database for storing scraped data.
Plotly: A data visualization library for creating interactive charts and graphs.
Pandas: A data manipulation library for Python.
ReportLab: A library for generating PDF reports.
Other Python libraries for web development and data processing.

WebScout combines the power of these technologies to provide a seamless web scraping and data analysis experience.

Contributing

We welcome contributions from the open-source community to help improve and expand WebScout's functionality. If you'd like to contribute, please refer to the Contribution Guidelines in the project repository for more information on how to get started.

Installation

To get started with WebScout, follow these installation steps:

Connect to a mongoDB cluster and note the uri, it will be neede to connect to the cluster.
Clone the Repository:

Clone the WebScout repository to your local machine using Git:
```
git clone https://github.com/mohdUwaish59/Web-Scout.git
```
Get into the directory:
```
cd Web-Scout
```
Configure Virtual environment:
```
virtualenv WebScout_Env
```

Activate the virtual environment:

For Windows
WebScout_Env/Scripts/Activate

Install Dependencies and run server:

pip install -r requirements.txt
python manage.py runserver

Set up mongoDB uri and paste it in book spider and GettyImages spider at thier respetive places
Run the web server
```
python manage.py runserver
```

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
ImageScrapper/ImageScrapper		ImageScrapper/ImageScrapper
Images		Images
Star_Scrapper		Star_Scrapper
WebScout		WebScout
WebScoutApp		WebScoutApp
templates		templates
LICENSE		LICENSE
README.md		README.md
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt
run_image_spider.bat		run_image_spider.bat
run_spider.bat		run_spider.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web-Scout

Table of Contents

About

Screenshots

Technologies Used

Contributing

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mohdUwaish59/Web-Scout

Folders and files

Latest commit

History

Repository files navigation

Web-Scout

Table of Contents

About

Screenshots

Technologies Used

Contributing

Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages