Skip to content

StatistikStadtZuerich/ld-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

457 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LD Pipeline

"LD Pipeline" (fka "LD Pipeline 2024") is a redevelopment of the existing pipeline, focusing on improving flexibility, efficiency and scalability. This project involves extracting data from the Harmonized Database (HDB), converting it into triples, and then transferring it into an RDF database.

The pipeline code is developed on GitLab and mirrored on GitHub.

Getting Started

These instructions will help you get a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Before you begin, ensure you have the following installed and configured:

  • Python 3.6 or higher, development is done with python 3.10

Installation

Follow these steps to set up your development environment:

  1. Clone the repository:

    1. git clone https://cmp-sdlc.stzh.ch/OE-7035/ssz-lod/ld-pipeline.git
    2. cd ld-pipeline
  2. It is recommended to use a virtual environment.

    1. python -m venv .venv
    2. source .venv/bin/activate
  3. Install the required Python dependencies:

    • pip install -r requirements.txt
  4. Execute the pipeline:

    • Get help: python main.py --help
    • Run pipeline: python main.py run
    • Run pipeline in env:test: python main.py run --env test
    • List all supported step names: python main.py list-step-names
    • Run single step in env:test: python main.py step --env test --name copyStatic

Pipeline Steps

Copy Step

  • "copyStatic": Copies static.n3 files from /static to defined output folder.

Templating Step

  • "codeTemplating": Creates a .ttl RDF file out of the given data, requested by an sql statement.

    ...

Compressing Step

  • "compressing": Compresses all .ttl files with the gzip algorithm to a new directory.

UploadToFuseki Step

  • "uploadToFuseki": Uploads all compressed .gz files to a configured fuseki server.

Run with docker

  1. Build docker image: docker build -t ssz/ld-pipeline .
  2. Run dockerized application (e.g. help command): docker run ssz/ld-pipeline --help
  3. Mount an output directory, e.g.: docker run --mount type=bind,source="$(pwd)"/tmp,target=/out ssz/ld-pipeline step --env local --name copyStatic.

HINT: be sure to use only local, int or prod as environment within docker.

Development

  1. Unit Tests
    • Unit tests can be found under tests/unit
    • Run: SSZ_DB_TYPE=mock python -m pytest tests/unit
  2. Integration Tests
    • Integration tests can be found under tests/integration
    • The tests create the environment using docker (tests/integration/compose.yaml)
    • Be sure docker is up and running
    • Run: python -m pytest --container-scope=session tests/integration
  3. Local Pipeline execution
  4. Linter
    • Run ruff check to lint all files in the current directory
    • Run ruff format to format all files in the current directory

License

This project is licensed under the Apache License 2.0 (Apache-2.0). See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors