Timetable Scrapers

Python package for scraping exam timetables.

Overview

This provides a plugin-based architecture for extracting timetable data from excel files. Each institution has its own implementation, ensuring maintanability and extensibility.

Key Features

Plugin Architecture: Easy to add new institution scrapers.
Standardized Output: All scrapers return the samde data structure
Type Safe: Uses dataclassases and type hints
Extensible: Base class provides common functionality
Independent: Can be used in multiple projects

Installation

cd timetable-scrapers pip install -e . pip install -e ".[dev]"

Production (soon)

pip install timetable-scrapers

Usage

Basic Usage

from timetable_scrapers import ScraperRegistry

# Get a scraper instance
scraper = ScraperRegistry.get_scraper("nursing_exams")

# Extract data from file
with open("timetable.xlsx", "rb") as f:
    courses = scraper.extract(f)

# courses is a list of CourseEntry objects
for course in courses:
    print(f"{course.course_code} at {course.venue} on {course.day}")

Available Scrapers

nursing_exams: Nursing exam timetable
strath: Strathmore University timetable
kca: KCA University exam timetable
Daystar: Daystar exam timetable

List Available Scrapers

from timetable_scrapers import ScraperRegistry

available = ScraperRegistry.list_scrapers()
print(available)  # ['nursing_exams', 'strath', 'kca', 'school_exams']

Converting to Dictionary

from timetable_scrapers import ScraperRegistry, CourseEntry

scraper = ScraperRegistry.get_scraper("kca")
courses = scraper.extract(file)

# Convert to dictionaries for JSON serialization
course_dicts = [course.to_dict() for course in courses]

Professor API Integration

To send scraped data to the Professor API, use build_ingest_payload() which creates contract-compliant payloads:

from timetable_scrapers import ScraperRegistry, build_ingest_payload, get_institution_id

# Extract data
scraper = ScraperRegistry.get_scraper("nursing_exams")
with open("timetable.xlsx", "rb") as f:
    entries = scraper.extract(f)

# Get stable institution ID
institution_id = get_institution_id("nursing_exams")

# Build contract-compliant payload
payloads = build_ingest_payload(
    institution_id=institution_id,
    semester_id=12,
    entries=entries,
)

# Send to Professor API
import requests
for payload in payloads:
    response = requests.post(
        "https://professor.example.com/api/exams/ingest/",
        json=payload,
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer <token>"
        }
    )
    print(f"Created: {response.json()['created']}, Updated: {response.json()['updated']}")

Key features:

Automatically parses structured exam_date, start_time, end_time from free-form day/time strings
Removes datetime_str from items (moves to raw_data if present)
Deduplicates entries by (institution_id, semester_id, course_code) with last-wins policy
Optionally chunks large batches: build_ingest_payload(..., chunk_size=5000)

Architecture

Package Structure

timetable_scrapers/
├── base/              # Base classes and interfaces
├── utils/             # Shared utilities
├── scrapers/          # Institution-specific scrapers
│   ├── nursing/
│   ├── strath/
│   ├── kca/
│   └── school/
├── registry.py        # Scraper registry/factory
└── schemas.py         # Data models

Design Patterns

Strategy Pattern: Each institution is a strategy (scraper class)
Factory Pattern: Registry creates scraper instances
Plugin Pattern: Scrapers register themselves automatically

Adding a New Scraper

Create a new directory under scrapers/
Create scraper class inheriting from BaseTimetableScraper
Implement institution_name property and extract() method
Register with @ScraperRegistry.register("name")
Import in scrapers/__init__.py

Example:

from ...base.scraper import BaseTimetableScraper
from ...registry import ScraperRegistry
from ...schemas import CourseEntry

@ScraperRegistry.register("new_institution")
class NewInstitutionScraper(BaseTimetableScraper):
    @property
    def institution_name(self) -> str:
        return "new_institution"

    def extract(self, file) -> List[CourseEntry]:
        # Your extraction logic here
        return []

Data Structure

All scrapers return CourseEntry objects with the following fields:

course_code (str): Course code (required)
day (str): Day/date string
time (str): Time string (e.g., "8:00AM-10:00AM")
venue (str): Venue/room name
campus (str): Campus name
coordinator (str): Coordinator name
hrs (str): Duration in hours
invigilator (str): Invigilator name
datetime_str (str, optional): ISO format datetime
course_name (str): Full course name
raw_data (dict): Institution-specific data

Development

Running Tests

pytest tests/

Code Quality

# Format code
black src/

# Type checking
mypy src/

# Linting
ruff check src/

License

MIT

Contributing

When adding a new institution scraper:

Follow the existing scraper structure
Implement all abstract methods
Add tests for your scraper
Update this README with the new scraper name

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/timetable_scrapers		src/timetable_scrapers
tests		tests
.gitignore		.gitignore
README.MD		README.MD
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Timetable Scrapers

Overview

Key Features

Installation

Production (soon)

Usage

Basic Usage

Available Scrapers

List Available Scrapers

Converting to Dictionary

Professor API Integration

Architecture

Package Structure

Design Patterns

Adding a New Scraper

Data Structure

Development

Running Tests

Code Quality

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Timetable Scrapers

Overview

Key Features

Installation

Production (soon)

Usage

Basic Usage

Available Scrapers

List Available Scrapers

Converting to Dictionary

Professor API Integration

Architecture

Package Structure

Design Patterns

Adding a New Scraper

Data Structure

Development

Running Tests

Code Quality

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages