Python package for scraping exam timetables.
This provides a plugin-based architecture for extracting timetable data from excel files. Each institution has its own implementation, ensuring maintanability and extensibility.
Plugin Architecture: Easy to add new institution scrapers.Standardized Output: All scrapers return the samde data structureType Safe: Uses dataclassases and type hintsExtensible: Base class provides common functionalityIndependent: Can be used in multiple projects
cd timetable-scrapers pip install -e . pip install -e ".[dev]"
pip install timetable-scrapers
from timetable_scrapers import ScraperRegistry
# Get a scraper instance
scraper = ScraperRegistry.get_scraper("nursing_exams")
# Extract data from file
with open("timetable.xlsx", "rb") as f:
courses = scraper.extract(f)
# courses is a list of CourseEntry objects
for course in courses:
print(f"{course.course_code} at {course.venue} on {course.day}")nursing_exams: Nursing exam timetablestrath: Strathmore University timetablekca: KCA University exam timetableDaystar: Daystar exam timetable
from timetable_scrapers import ScraperRegistry
available = ScraperRegistry.list_scrapers()
print(available) # ['nursing_exams', 'strath', 'kca', 'school_exams']from timetable_scrapers import ScraperRegistry, CourseEntry
scraper = ScraperRegistry.get_scraper("kca")
courses = scraper.extract(file)
# Convert to dictionaries for JSON serialization
course_dicts = [course.to_dict() for course in courses]To send scraped data to the Professor API, use build_ingest_payload() which creates contract-compliant payloads:
from timetable_scrapers import ScraperRegistry, build_ingest_payload, get_institution_id
# Extract data
scraper = ScraperRegistry.get_scraper("nursing_exams")
with open("timetable.xlsx", "rb") as f:
entries = scraper.extract(f)
# Get stable institution ID
institution_id = get_institution_id("nursing_exams")
# Build contract-compliant payload
payloads = build_ingest_payload(
institution_id=institution_id,
semester_id=12,
entries=entries,
)
# Send to Professor API
import requests
for payload in payloads:
response = requests.post(
"https://professor.example.com/api/exams/ingest/",
json=payload,
headers={
"Content-Type": "application/json",
"Authorization": "Bearer <token>"
}
)
print(f"Created: {response.json()['created']}, Updated: {response.json()['updated']}")Key features:
- Automatically parses structured
exam_date,start_time,end_timefrom free-formday/timestrings - Removes
datetime_strfrom items (moves toraw_dataif present) - Deduplicates entries by
(institution_id, semester_id, course_code)with last-wins policy - Optionally chunks large batches:
build_ingest_payload(..., chunk_size=5000)
timetable_scrapers/
├── base/ # Base classes and interfaces
├── utils/ # Shared utilities
├── scrapers/ # Institution-specific scrapers
│ ├── nursing/
│ ├── strath/
│ ├── kca/
│ └── school/
├── registry.py # Scraper registry/factory
└── schemas.py # Data models
- Strategy Pattern: Each institution is a strategy (scraper class)
- Factory Pattern: Registry creates scraper instances
- Plugin Pattern: Scrapers register themselves automatically
- Create a new directory under
scrapers/ - Create scraper class inheriting from
BaseTimetableScraper - Implement
institution_nameproperty andextract()method - Register with
@ScraperRegistry.register("name") - Import in
scrapers/__init__.py
Example:
from ...base.scraper import BaseTimetableScraper
from ...registry import ScraperRegistry
from ...schemas import CourseEntry
@ScraperRegistry.register("new_institution")
class NewInstitutionScraper(BaseTimetableScraper):
@property
def institution_name(self) -> str:
return "new_institution"
def extract(self, file) -> List[CourseEntry]:
# Your extraction logic here
return []All scrapers return CourseEntry objects with the following fields:
course_code(str): Course code (required)day(str): Day/date stringtime(str): Time string (e.g., "8:00AM-10:00AM")venue(str): Venue/room namecampus(str): Campus namecoordinator(str): Coordinator namehrs(str): Duration in hoursinvigilator(str): Invigilator namedatetime_str(str, optional): ISO format datetimecourse_name(str): Full course nameraw_data(dict): Institution-specific data
pytest tests/# Format code
black src/
# Type checking
mypy src/
# Linting
ruff check src/MIT
When adding a new institution scraper:
- Follow the existing scraper structure
- Implement all abstract methods
- Add tests for your scraper
- Update this README with the new scraper name