I decided to build a web scraper entirely from scratch, without using any external libraries. The request sender, HTTP parser, and HTML parser are all implemented by hand.
The request sender is built using only the ssl and socket modules. It supports: • HTTPS connections • Chunked transfer encoding
• a request builder
• a request sender
• a file for the actual parsing logic
• a file for defining and managing parser attributes
• two tokenizer versions (tokenizer_v1_0_0 is the stable one)
• a token class
• a tree constructor for building the DOM
• a debug mode (prints the current state of all internal buffers)
• a pretty printer to visualize the generated DOM tree
Hope you enjoy i