"LD Pipeline" (fka "LD Pipeline 2024") is a redevelopment of the existing pipeline, focusing on improving flexibility, efficiency and scalability. This project involves extracting data from the Harmonized Database (HDB), converting it into triples, and then transferring it into an RDF database.
The pipeline code is developed on GitLab and mirrored on GitHub.
These instructions will help you get a copy of the project up and running on your local machine for development and testing purposes.
Before you begin, ensure you have the following installed and configured:
- Python 3.6 or higher, development is done with python 3.10
Follow these steps to set up your development environment:
-
Clone the repository:
git clone https://cmp-sdlc.stzh.ch/OE-7035/ssz-lod/ld-pipeline.gitcd ld-pipeline
-
It is recommended to use a virtual environment.
python -m venv .venvsource .venv/bin/activate
-
Install the required Python dependencies:
pip install -r requirements.txt
-
Execute the pipeline:
- Get help:
python main.py --help - Run pipeline:
python main.py run - Run pipeline in env:test:
python main.py run --env test - List all supported step names:
python main.py list-step-names - Run single step in env:test:
python main.py step --env test --name copyStatic
- Get help:
- "copyStatic": Copies static.n3 files from /static to defined output folder.
-
"codeTemplating": Creates a
.ttlRDF file out of the given data, requested by an sql statement....
- "compressing": Compresses all
.ttlfiles with the gzip algorithm to a new directory.
- "uploadToFuseki": Uploads all compressed
.gzfiles to a configured fuseki server.
- Build docker image:
docker build -t ssz/ld-pipeline . - Run dockerized application (e.g. help command):
docker run ssz/ld-pipeline --help - Mount an output directory, e.g.:
docker run --mount type=bind,source="$(pwd)"/tmp,target=/out ssz/ld-pipeline step --env local --name copyStatic.
HINT: be sure to use only local, int or prod as environment within docker.
- Unit Tests
- Unit tests can be found under tests/unit
- Run:
SSZ_DB_TYPE=mock python -m pytest tests/unit
- Integration Tests
- Integration tests can be found under tests/integration
- The tests create the environment using docker (tests/integration/compose.yaml)
- Be sure docker is up and running
- Run:
python -m pytest --container-scope=session tests/integration
- Local Pipeline execution
- Install act from https://nektosact.com/
- Execute pipeline:
act --container-architecture linux/amd64
- Linter
- Run
ruff checkto lint all files in the current directory - Run
ruff formatto format all files in the current directory
- Run
This project is licensed under the Apache License 2.0 (Apache-2.0).
See the LICENSE file for details.