SurvStat Loader

A Python tool for automated data collection and processing from the SurvStat@RKI platform, Germany's interactive infectious disease surveillance system.

Overview

This project automates the download and processing of disease surveillance data from the Robert Koch Institute (RKI). Using Selenium, it downloads data for specified diseases and years, organized by week and German administrative districts (Kreise). The data is then processed and harmonized using regional identification codes.

For a walkthrough of the pipleine in this rep, have a look at Extracting_survstat_data.ipynb, in src.

Features

Automated Data Collection: Downloads disease data from SurvStat@RKI for multiple diseases and years
Data Processing: Merges yearly files into standardized datasets
Geographic Harmonization: Translates district names to official regional codes (Kreiskennziffern)
Flexible Configuration: Easy configuration through YAML files
DataProcessingOrchestrator: A (large and generic) class for chaining data processing steps

Installation

Clone the repository:

git clone https://github.com/stends2001/survstat_data.git
cd survstat_loader

Install dependencies:

pip install -r requirements.txt

Configure the project by editing config.yaml if needed.

Usage

Quick Start

Run the main script to download and process current year data:

python src/update_survstatdata.py

Manual Usage

from src.survstat_collecting.survstat_scraper import scrape_survstat_data
from src.survstat_collecting.casedata_processing import preprocess_survstat_data

# Update current datafiles for the current year
scrape_survstat_data(
    disease_names={'campylobacter': 'Campylobacter'},
    years='2025',
    output_directory=directories_dict['dir_data_raw'], 
    downloads_directory=directories_dict['dir_downloads']
)

# Process the downloaded data
preprocess_survstat_data(
    diseases=['Campylobacter'],
    years='2025',
    raw_data_dir=directories_dict['dir_data_raw'], 
    processed_data_dir=directories_dict['dir_data_preprocessed'],
    how='update'
)

Demo with Sample Data

To see the tool in action with sample data:

Generate sample data (if you have real data):

python src/create_github_sample.py

View the demo notebook:
- Open src/Demo_measles_visualization.ipynb to see a visualization of national measles data
- The sample data contains real national measles data aggregated from the SurvStat system
Sample data structure:
- data/sample/measles_national.csv: National weekly measles cases
- Contains: timestamp, cases columns
- Safe to share on GitHub (national-level data only)

Project Structure

survstat_loader/
├── src/
│   ├── dataprocessor/          # Data processing modules
│   ├── survstat_collecting/    # Web scraping and data collection
│   ├── utils/                  # Utility functions
│   ├── update_survstatdata.py  # Main execution script
│   ├── preview_epicurve.py     # Show weekly casenumbers of downloaded data
│   ├── create_github_sample.py # Create sample data for GitHub
│   └── Demo_measles_visualization.ipynb # Demo notebook
├── data/                       # Data storage (not in git)
│   ├── raw/                    # Raw downloaded files
│   ├── preprocessed/           # Processed datasets
│   ├── harmonization/          # Geographic mapping files
│   └── sample/                 # Sample data for GitHub (committed)
├── config.yaml                 # Configuration file
└── requirements.txt            # Python dependencies

Data Privacy

Real data: Stored in data/ directories and excluded from Git
Sample data: Only national-level aggregated data is shared
Regional data: Contains district-level information and is kept private
Full datasets: Multiple diseases and years are excluded from repository

Configuration

Edit config.yaml to customize:

Data directory paths
Download locations
Processing options

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
log.txt		log.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SurvStat Loader

Overview

Features

Installation

Usage

Quick Start

Manual Usage

Demo with Sample Data

Project Structure

Data Privacy

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SurvStat Loader

Overview

Features

Installation

Usage

Quick Start

Manual Usage

Demo with Sample Data

Project Structure

Data Privacy

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages