Crawly

Crawly created by Patryk 'UltiPro' Wójtowicz using Python.

The project is a web crawler with web scrapper that implements both BFS and DFS search methods. It can be configured by selecting options such as search method, time limits, search depth, whether to generate a full graph, and optional proxy server settings. The application collects only URLs and the contents of "a" tags. However, the code can be easily adapted to specific needs in the "_process_page" function. During execution, the program launches a browser using the Playwright package. The browser navigates through web pages, if necessary, it pauses to let the user solve captchas etc. The output consists of a CSV file containing URLs and "a" tags contents, as well as an HTML page with a graph representing the connections between websites.

Dependencies and Usage

Dependencies:

beautifulsoup4 4.13.3
bs4 0.0.2
fake-useragent 2.0.3
greenlet 3.1.1
narwhals 1.28.0
networkx 3.4.2
numpy 2.2.3
packaging 24.2
playwright 1.50.0
plotly 6.0.0
pyee 12.1.1
soupsieve 2.6
typing_extensions 4.12.2

Installation:

cd "/Crawly"

pip install -r requirements.txt

playwright install

Using the app

python main.py [url-address] [options]

Option	Short	Description	Default Value
--method	-m	Search method	bfs
--time	-t	Execution time (s)	60
--depth	-d	Maximum search depth	10
--full_graph	-fg	Generate a full graph	False
--proxy_server	-ps	Proxy server IP/address	—
--proxy_username	-pu	Proxy username	—
--proxy_password	-pp	Proxy password	—

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Crawly		Crawly
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Crawly

Dependencies and Usage

Using the app

Preview

About

Uh oh!

Releases

Packages

Languages

License

UltiPro/Crawly

Folders and files

Latest commit

History

Repository files navigation

Crawly

Dependencies and Usage

Using the app

Preview

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages