Skip to content

Onixx241/GuineaWebCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

68 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐹 GuineaCS

GuineaCS is a lightweight, single-page web crawler written in C#.

This project was created as a personal learning journey β€” to better understand how web crawling works, from string parsing to link resolution, and eventually recursive crawling.


Status Language License


πŸš€ Features

  • βœ… Crawl the web starting from a seed URL
  • βœ… Respect robots.txt compliance
  • βœ… Command-line flags for configuration
  • βœ… Filter links (e.g. mailto:, hashtags, domains)(Extensible through ILinkFilter.cs)
  • βœ… Export results to plain text or JSON
  • βœ… Database export
  • βœ… Save crawled HTML pages locally
  • (Coming Soon) MongoDB export

🧾 Usage

dotnet run -url "https://example.com" -limit 25 -dmode truetrue

CLI Flags

Flag Description Example
-url Seed URL to begin crawling -url "https://example.com"
-limit Number of pages to crawl -limit 25
-dmode Enable same-domain crawling only (true/false) -dmode true

πŸ“š Why "GuineaCS"?

Because guinea pigs are curious explorers β€” just like this crawler.
And it’s written in C# β€” so I named it , GuineaCS.


πŸ’‘ Goals

  • Practice C# fundamentals
  • Explore real-world software design
  • Build a tool worth sharing

πŸ›  Specs

  • Language: C#
  • Runtime: .NET 7+
  • Style: Minimal, modular, and educational

πŸ™Œ Contributing

GuineaCS is a personal project and a learning sandbox β€” but suggestions and ideas are always welcome.


πŸ‘€ Author

Myself and my beautiful laptop.


πŸ“ License

MIT License β€” free to use, share, and learn from.

About

A Simplistic C# Web Crawler named after my favorite animal that crawls !🐹🐾

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages