Knowledge Graph Analysis Platform
A microservices-based platform for building, managing, and analyzing knowledge graphs using SPARQL and linked data technologies.
Comprehensive documentation is available in the docs/ directory:
- GraphDB Component - RDF triple store and SPARQL endpoint
- Jupyter Component - Interactive notebooks for data analysis
- Sembench Component - Automated semantic processing
- LDES Consumer Component - Multi-feed LDES harvesting
K-GAP provides a complete, containerized environment for working with knowledge graphs. It combines specialized microservices that work together to:
- Store and query RDF data using GraphDB
- Harvest and ingest data from LDES (Linked Data Event Streams) feeds
- Analyze and process knowledge graphs using Python tools (Sembench)
- Explore data interactively through Jupyter notebooks
- Microservices Architecture: Each component runs as an independent Docker container
- LDES Integration: Automated harvesting from multiple Linked Data Event Streams
- Interactive Analysis: Jupyter notebooks for data exploration and visualization
- Scalable Storage: GraphDB repository with configurable resources
- Automated Processing: Scheduled data processing pipelines via Sembench
- Docker (version 20.10 or higher)
- Docker Compose (version 2.0 or higher)
- At least 16GB RAM recommended
# 1. Clone the repository
git clone https://github.com/vliz-be-opsci/k-gap.git
cd k-gap
# 2. Configure environment
cp dotenv-example .env
# Edit .env to customize settings if needed
# 3. Create data directories
mkdir -p ./data ./notebooks
# 4. Start all services
docker compose up -dOnce started, access the following services:
- GraphDB Workbench: http://localhost:7200
- Jupyter Notebooks: http://localhost:8889
- YASGUI (SPARQL UI): http://localhost:8080
K-GAP consists of four main Docker images:
GraphDB is the core RDF triple store providing SPARQL query capabilities and persistent storage.
- Base:
ontotext/graphdb:10.4.4 - Port: 7200
- Docs: GraphDB Component
Interactive notebook environment for data analysis with pre-installed RDF/SPARQL tools.
- Base:
jupyter/base-notebook - Port: 8889
- Docs: Jupyter Component
Python-based semantic processing engine for scheduled data processing tasks.
- Base:
python:3.10 - Docs: Sembench Component
Multi-feed LDES harvesting service that wraps ldes2sparql.
- Base:
python:3.10-slim - Docs: LDES Consumer Component | README
Build all Docker images locally:
make docker-buildBuild with a specific tag:
make BUILD_TAG=0.2.0 docker-buildBuild and push images to a container registry:
make REG_NS=ghcr.io/vliz-be-opsci/kgap docker-pushImages are automatically built and published to GitHub Container Registry on release:
ghcr.io/vliz-be-opsci/kgap/kgap_graphdb:latestghcr.io/vliz-be-opsci/kgap/kgap_jupyter:latestghcr.io/vliz-be-opsci/kgap/kgap_sembench:latestghcr.io/vliz-be-opsci/kgap/kgap_ldes-consumer:latest
K-GAP is configured through environment variables in a .env file. See Configuration Guide for details.
Key configuration areas:
- GraphDB repository settings
- LDES feed configuration
- Sembench processing schedules
- Resource allocation
Using YASGUI web interface:
Navigate to http://localhost:8080 and run SPARQL queries
Using Jupyter notebooks:
from kgap_tools import execute_to_df
df = execute_to_df('my_query', param1='value1')
display(df)Edit data/ldes-feeds.yaml to add/remove feeds, then:
docker compose restart ldes-consumerdocker compose logs -f graphdb
docker compose logs -f jupyter
docker compose logs -f sembench
docker compose logs -f ldes-consumerSee the Development section in the documentation for:
- Project structure
- Adding new components
- Contributing guidelines
- py-sema - Python semantic processing library
- ldes2sparql - LDES harvesting tool
- GraphDB - RDF database
- Jupyter - Interactive computing
K-GAP is licensed under the MIT License. See LICENSE for details.
For issues and questions:
- GitHub Issues: https://github.com/vliz-be-opsci/k-gap/issues
- Organization: https://github.com/vliz-be-opsci