Skip to content
This repository was archived by the owner on Mar 3, 2025. It is now read-only.

UltiPro/Crawly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawly

Crawly created by Patryk 'UltiPro' Wójtowicz using Python.

The project is a web crawler with web scrapper that implements both BFS and DFS search methods. It can be configured by selecting options such as search method, time limits, search depth, whether to generate a full graph, and optional proxy server settings. The application collects only URLs and the contents of "a" tags. However, the code can be easily adapted to specific needs in the "_process_page" function. During execution, the program launches a browser using the Playwright package. The browser navigates through web pages, if necessary, it pauses to let the user solve captchas etc. The output consists of a CSV file containing URLs and "a" tags contents, as well as an HTML page with a graph representing the connections between websites.

Dependencies and Usage

Dependencies:

  • beautifulsoup4 4.13.3
  • bs4 0.0.2
  • fake-useragent 2.0.3
  • greenlet 3.1.1
  • narwhals 1.28.0
  • networkx 3.4.2
  • numpy 2.2.3
  • packaging 24.2
  • playwright 1.50.0
  • plotly 6.0.0
  • pyee 12.1.1
  • soupsieve 2.6
  • typing_extensions 4.12.2

Installation:

cd "/Crawly"

pip install -r requirements.txt

playwright install

Using the app

python main.py [url-address] [options]

Option Short Description Default Value
--method -m Search method bfs
--time -t Execution time (s) 60
--depth -d Maximum search depth 10
--full_graph -fg Generate a full graph False
--proxy_server -ps Proxy server IP/address
--proxy_username -pu Proxy username
--proxy_password -pp Proxy password

Preview

Terminal Preview

CSV Preview

HTML Preview

About

Crawly created by Patryk 'UltiPro' Wójtowicz using Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages