GuineaCS is a lightweight, single-page web crawler written in C#.
This project was created as a personal learning journey β to better understand how web crawling works, from string parsing to link resolution, and eventually recursive crawling.
- β Crawl the web starting from a seed URL
- β
Respect
robots.txtcompliance - β Command-line flags for configuration
- β
Filter links (e.g.
mailto:, hashtags, domains)(Extensible through ILinkFilter.cs) - β Export results to plain text or JSON
- β Database export
- β Save crawled HTML pages locally
- (Coming Soon) MongoDB export
dotnet run -url "https://example.com" -limit 25 -dmode truetrue| Flag | Description | Example |
|---|---|---|
-url |
Seed URL to begin crawling | -url "https://example.com" |
-limit |
Number of pages to crawl | -limit 25 |
-dmode |
Enable same-domain crawling only (true/false) |
-dmode true |
Because guinea pigs are curious explorers β just like this crawler.
And itβs written in C# β so I named it , GuineaCS.
- Practice C# fundamentals
- Explore real-world software design
- Build a tool worth sharing
- Language: C#
- Runtime: .NET 7+
- Style: Minimal, modular, and educational
GuineaCS is a personal project and a learning sandbox β but suggestions and ideas are always welcome.
Myself and my beautiful laptop.
MIT License β free to use, share, and learn from.