A small exploratory project dedicated to exploring the Common Crawl dataset. WIP.
Installation
python3 -m venv ./.venv
source ./.venv/bin/activate
python3 -m pip install -r requirements.txtDirectory structure
- src/data – data-collection code
- src/notebooks – exploratory notebooks
Refer to each directory’s README.md for detailed docs.