Kent Repertory ETL

This project converts the static, hierarchical, hyperlink-based content of the Kent repertory webpage into a normalized relational database schema.

Overview

Scraping: Fetch and parse HTML content using Python (Requests + BeautifulSoup).
Transformation: Clean and organize data into structured formats.
Database Loading: Insert structured data into PostgreSQL.

Directory Structure

src/ - Source code for scraping, transformation, and loading.
data/ - Stores raw and processed data.
docs/ - Documentation for the project.
tests/ - Unit tests for the modules.

Setup

From the project root, run

make

Then, activate the virtual environment.

# On macOS/Linux:
source env/bin/activate
# On Windows:
.\env\Scripts\activate

Run the scraper to fetch and parse the HTML content:
```
python src/scraper.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
data/raw		data/raw
docs		docs
scripts		scripts
src		src
tests		tests
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kent Repertory ETL

Overview

Directory Structure

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kent Repertory ETL

Overview

Directory Structure

Setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages