Web Scraper Built Completely From Scratch

I decided to build a web scraper entirely from scratch, without using any external libraries. The request sender, HTTP parser, and HTML parser are all implemented by hand.

Request Sender

The request sender is built using only the ssl and socket modules. It supports: • HTTPS connections • Chunked transfer encoding

The request sender is split into:

•	a request builder
•	a request sender

HTTP Parser

The HTTP parser consists of:

•	a file for the actual parsing logic
•	a file for defining and managing parser attributes

HTML Parser

The HTML parser is the core of the project. It includes:

•	two tokenizer versions (tokenizer_v1_0_0 is the stable one)
•	a token class
•	a tree constructor for building the DOM

The tokenizer features:

•	a debug mode (prints the current state of all internal buffers)
•	a pretty printer to visualize the generated DOM tree

Hope you enjoy i

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
HTML_PARSER		HTML_PARSER
HTTP_PARSER		HTTP_PARSER
SEND_REQUEST		SEND_REQUEST
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper Built Completely From Scratch

Request Sender

The request sender is split into:

HTTP Parser

The HTTP parser consists of:

HTML Parser

The HTML parser is the core of the project. It includes:

The tokenizer features:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraper Built Completely From Scratch

Request Sender

The request sender is split into:

HTTP Parser

The HTTP parser consists of:

HTML Parser

The HTML parser is the core of the project. It includes:

The tokenizer features:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages