[FEATURE] Incremental / Delta Processing


Process only what changed instead of reprocessing everything. Use versioned graphs and deltas to drive fast, cost‑efficient pipelines.

---

### Relevant Modules and Test Folders

- Implementation is split across:
  - `semantica/change_management/` (snapshot/version handling).
  - `semantica/triplet_store/` (delta computation between graphs).
  - `semantica/pipeline/` (delta‑aware steps).
- Tests belong under existing folders:
  - `tests/change_management/`
  - `tests/triplet_store/`
  - `tests/pipeline/`
- No new packages; everything reuses the current `semantica/` and `tests/` layout.

---

### High‑Level Behaviour

- Captures snapshots of graphs over time (e.g. nightly loads, ingestion runs).
- Computes deltas between snapshots (added/removed triples).
- Allows validation and transformation jobs to operate on deltas instead of full datasets.
- Integrates with the pipeline engine to support delta‑aware workflows.

---

### Operational Benefits

- Large datasets become practical to manage and update frequently.
- Dramatically reduces compute cost and processing time.
- Enables near real‑time pipelines and more frequent deployments.
- Essential for production‑grade, large‑scale semantic infrastructure.

---

### Current Implementation

- Triplet store and SPARQL
  - semantica/triplet_store/triplet_store.py
  - semantica/triplet_store/query_engine.py
- Change management and versioning
  - semantica/change_management/version_storage.py
  - semantica/change_management/managers.py
- Pipeline
  - semantica/pipeline/pipeline_builder.py
  - semantica/pipeline/execution_engine.py
- Docs and tests
  - docs/reference/change_management.md
  - docs/reference/triplet_store.md
  - docs/reference/pipeline.md
  - tests/change_management/test_temporal_versioning.py
  - tests/change_management/test_integration_realworld.py
  - tests/triplet_store/test_triplet_store.py
  - tests/pipeline/test_pipeline_comprehensive.py

---

### Required Additions and Modifications

- Snapshot APIs
  - In semantica/change_management/version_storage.py:
    - Ensure explicit functions to create and reference graph snapshots by version ID.
- Delta computation
  - In semantica/triplet_store/triplet_store.py or query_engine.py:
    - Implement compute_delta(old_graph_uri, new_graph_uri) returning added/removed triples or delta graphs.
- Delta‑aware pipeline configuration
  - In semantica/pipeline/pipeline_builder.py:
    - Add delta_mode and version references to step configuration.
  - In semantica/pipeline/execution_engine.py:
    - Use version_storage and compute_delta to run steps on deltas when enabled.
- Retention and cleanup
  - In semantica/change_management/managers.py:
    - Provide policies and utilities for pruning old snapshots and deltas.
- Documentation and tests
  - Extend docs and tests listed above with:
    - Examples and assertions for full‑graph vs delta‑mode processing.

---

### Step‑by‑Step Implementation Guide

1. Graph snapshot capture
   - In change_management/version_storage.py:
     - Ensure there is an API to:
       - Persist a snapshot of a given graph (by URI or logical name).
       - Tag snapshots with version IDs, timestamps, and metadata.
     - Backing storage can reuse existing SQLite or in‑memory backends.

2. Delta computation helpers
   - In triplet_store/triplet_store.py or query_engine.py:
     - Implement a function such as:
       - compute_delta(old_graph_uri: str, new_graph_uri: str) -> Dict[str, Any]
     - Use SPARQL or native backend operations to compute:
       - added_triples
       - removed_triples
     - Represent deltas as:
       - lists of triples OR
       - named graphs like old_graph_uri + "-delta-added".

3. Delta‑aware jobs in pipeline
   - In pipeline_builder.py:
     - Extend pipeline step configuration with:
       - delta_mode: bool
       - base_version_id, target_version_id (or equivalent).
   - In execution_engine.py:
     - When delta_mode is true:
       - Resolve the relevant snapshot URIs via version_storage.
       - Compute delta using the helper.
       - Pass only the delta graph or triple set to downstream steps (validation, enrichment, export).

4. Configuration and cleanup
   - In change_management/managers.py:
     - Provide options for:
       - snapshot retention policies (e.g. keep last N versions).
       - automatic cleanup of obsolete snapshot graphs and delta graphs.

5. Documentation
   - In docs/reference/change_management.md:
     - Add “Incremental / Delta Processing” section with:
       - A simple example: two snapshots, computed delta, and a delta‑aware validation job.
   - Update pipeline documentation to show how to enable delta_mode on steps.

---

### Done When

- Versioned snapshots of graphs can be created and referenced through change_management.
- A public helper computes deltas between two snapshots using the existing triplet store.
- At least one pipeline test demonstrates:
  - Full‑graph processing vs delta‑based processing, with equivalent logical results.
- Documentation clearly describes how to configure and use incremental / delta processing in Semantica.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Incremental / Delta Processing #323

Relevant Modules and Test Folders

High‑Level Behaviour

Operational Benefits

Current Implementation

Required Additions and Modifications

Step‑by‑Step Implementation Guide

Done When

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Incremental / Delta Processing #323

Description

Relevant Modules and Test Folders

High‑Level Behaviour

Operational Benefits

Current Implementation

Required Additions and Modifications

Step‑by‑Step Implementation Guide

Done When

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions