Add correctness, rule-consistency, and currency metrics by jb3rndt · Pull Request #4 · HPI-Information-Systems/Metis

jb3rndt · 2025-10-19T19:14:22Z

This PR is based on #3, but I'm opening this already to get your feedback :)

Adds three new metrics:

correctness: compares each data point with its ground truth using a distance function (either simple absolute difference for numbers or levenshtein distance for strings)
currency: given a decline rate per column, the name of the column that contains the assessment date of each value in the tuple, and optionally a simulated assessment date to not rely on "now", calculates the currency based on this formula: curr(w, A) = exp(-decline(A) * age(w,A)) (with w the attribute value and A the column)
- the decline rate is interpreted in years right now. It might be useful to make that configurable too?
rule-based consistency: checks whether the given rules per column hold on an attribute. Weighing a rule happens inside the rule definition itself. The return value of all rules are just added up when assessing the consistency value.
- since rules are defined as python functions right now, I allowed metrics to be initialized by passing a config object directly (keeping JSON as an option too of course)

Copilot

Pull Request Overview

This PR introduces three new data quality metrics (Correctness, Currency, and Rule-based Consistency), refactors writer implementations to use a shared SQLAlchemy-based DatabaseWriter, and adds a SQLAlchemy ORM model for persisting results.

New metrics: Correctness (distance vs. ground truth), Currency (exponential decay by age), Rule-based Consistency (rule aggregation with certainty).
Refactor: Unify SQLite/Postgres writers via DatabaseWriter and SQLAlchemy ORM models; add DQDimension enum and update DQResult to use it.
Config handling: Add MetricConfig base and load_config utility; allow passing config objects (notably for rule-based consistency).

Reviewed Changes

Copilot reviewed 18 out of 21 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
metis/writer/sqlite_writer.py	Switch SQLiteWriter to SQLAlchemy via DatabaseWriter; provide engine factory.
metis/writer/postgres_writer.py	Switch PostgresWriter to SQLAlchemy via DatabaseWriter; provide engine factory.
metis/writer/database_writer.py	New base writer using SQLAlchemy ORM models; centralizes table creation and writes.
metis/writer/console_writer.py	Minor typing tweak for optional config.
metis/utils/result.py	DQdimension type changed to DQDimension enum.
metis/utils/dq_dimension.py	New DQDimension StrEnum with dimensions.
metis/models.py	New SQLAlchemy declarative model and dynamic table registration.
metis/metric/metric.py	Add MetricConfig support and config loader.
metis/metric/currency/currency.py	Implement Currency metric with exponential decay by age.
metis/metric/currency/config.py	Config dataclass for Currency.
metis/metric/correctness/correctness.py	Implement Correctness metric (distance-based).
metis/metric/consistency/rule_consistency.py	Implement rule-based consistency with certainty annotation.
metis/metric/consistency/consistency.py	Make Consistency accept JSON config path; switch to DQDimension.
metis/metric/consistency/config.py	Config dataclasses for consistency metrics.
metis/metric/config.py	Base config dataclass helper.
metis/metric/completeness/completeness.py	Switch to DQDimension and updated typing.
metis/metric/init.py	Export new metrics.
metis/dq_orchestrator.py	Allow MetricConfig object in orchestrator assess API.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

metis/models.py

metis/writer/database_writer.py

metis/metric/consistency/consistency_countFDViolations.py

metis/metric/correctness/correctness.py

metis/utils/result.py

metis/models.py

metis/writer/postgres_writer.py

metis/writer/sqlite_writer.py

lisehr

The three metric implementations look good! Could you please still rename the metrics according to the new naming convention? _

jb3rndt · 2025-12-12T14:48:53Z

Metric names and their file names fit the new naming scheme now :)

jb3rndt · 2026-02-02T15:47:12Z

@lisehr I've updated the branch :) Here is a short summary of the parts that have changed since your last review:

Metrics

New: completeness_nullRate: Measures the ratio of null values in a dataset (differs from the existing completeness_nullRatio in that it can be configured which granularity should be used). Does it make sense to merge both? (if yes, which name should be kept?)
New: completeness_nullAndDMVRate: Extends null rate assessment by detecting Disguised Missing Values (DMVs) using FAHES algorithm
Changed: consistency_ruleBasedPipino: Added certainty calculation
Changed: timeliness_heinrich: Added certainty calculation

Framework Changes

New: Add metric-specific loggers
New: CSVWriter
New: Datetime Utilities: Precision calculation utilities for timeliness metrics

lisehr

Generally, all additions look good. A few minor / structural comments:

nullRate vs. nullRatio: do we really need both files? I suggest combining them to one file that calculates both, the rate (count) plus the ratio of nulls. Currently it seems that nullRate calculates the ratio too anyway.
Please move all configs to Metis/configs/metric
Why is correctness_heinrich in the writer folder?

lisehr

@jb3rndt: sorry, I messed up answering to your last questions.

completeness_nullRate vs nullRatio: the new file indeed looks much better and I suggest calling it completeness_nullRatio + make clear to add information in the header that both aspects (count + ratio) are calculated and stored to the DB, also on different granularity levels.
for completeness_nullAndDMVRate: also here, please make sure to declare FAHES with link to paper + github already in the header of the file. For the future, we could also use other DMV detectors like DisMis, for now, you can leave the filename as it is. Thank you!

Tomic-Riedel · 2026-03-05T20:55:03Z

Hey, I just took a look at what conflicts with the data-profiling branch that got merged in the meantime:

mesTime -> timestamp
The data-profiling branch renamed the first DQResult parameter from mesTime to timestamp. All the DQResult(mesTime=..., ...) calls throughout your metrics need to be updated to match.

database_models.py - Base and DataProfile
I see what you were going for with moving Base inside register_models(), that fixes the SQLAlchemy registry issue. The problem is that the data-profiling branch added a DataProfile model at module level that inherits from the old module-level Base, which your change removes. Even if we patch that reference, the Base returned from register_models() won't have DataProfile in its metadata, so create_all() in database_writer.py won't create the data_profiles table. Probably the cleanest fix is to move DataProfile inside register_models() as well so both models share the same fresh Base per call.

super().__init__() in existing metrics
The pre-existing metrics don't call super().__init__(), so they won't get self.logger. It's nothing critical at this point, but I think worth adding for consistency.

jb3rndt · 2026-03-06T16:10:36Z

Thank you, @Tomic-Riedel. I have created a separate Database class that contains the central database models and an engine for database access, if required. This initialization was previously hidden inside the DatabaseWriter, but, in favor of modularity, I think it is good to have a singleton (inspired by your DataProfileManager) that provides a stable reference to all models and the sqlalchemy Base, which are all bound to an engine. Right now, this Database singleton is initialized only when a DatabaseWriter is required, but it is now easy to change that in the future. Also, type inference for the individual database models works by just accessing the properties of Database.

Regarding the logger in the metrics: If we don't override the __init__ in a metric, the __init__ of the base class is always executed, so every metric should have access to its logger. I caught one leftover case where I initialized the logger in a subclass metric, which I just removed :)

Tomic-Riedel · 2026-03-20T13:17:16Z

Thanks! From an overall framework perspective this looks good to me. The Database singleton and the logger setup in the base class are very neat solutions imo :)

lisehr

As discussed, ready for merge, optimizations come later!

lisehr

Ready to merge

jb3rndt assigned lisehr and Phimanu Oct 19, 2025

Copilot AI review requested due to automatic review settings October 19, 2025 19:14

jb3rndt force-pushed the feat/correctness-metric branch from 3b1f382 to 9f9bc44 Compare October 19, 2025 19:16

Copilot AI reviewed Oct 19, 2025

View reviewed changes

jb3rndt force-pushed the feat/correctness-metric branch from 9d0d1ad to 7319535 Compare December 1, 2025 22:05

lisehr requested changes Dec 7, 2025

View reviewed changes

jb3rndt added 11 commits December 8, 2025 12:19

add correctness metric

4630a57

add rule-based consistency metric

7c131bf

add currency metric

c540fdc

add MetricConfig support in addition to json configuration

89da95b

prototypical assessment of certainty of the rule-consistency metric

379641f

fix correctness metric and adjust docstring

c823e93

add prototypical certainty calculation to correctness and currency

8947834

add postgres docker setup

cf3305c

add basic logging

33ad9bb

move common utilities into utils folder

a77a890

add fallback writing to csv in case the writer errors

bf8e8c9

jb3rndt force-pushed the feat/correctness-metric branch from 7319535 to df48399 Compare December 8, 2025 12:37

rename metrics according to naming scheme

cd9a0a7

jb3rndt force-pushed the feat/correctness-metric branch 2 times, most recently from 5b741b4 to fa03a80 Compare December 8, 2025 13:37

add csv writer

164ae7f

jb3rndt force-pushed the feat/correctness-metric branch 2 times, most recently from e815ab8 to 78a52bb Compare December 12, 2025 14:35

rename metric files

1c3d0a5

jb3rndt force-pushed the feat/correctness-metric branch from 78a52bb to 1c3d0a5 Compare December 12, 2025 14:47

add tuple rules to metric ConsistencyRuleBasedHinrichs

e1362ea

jb3rndt force-pushed the feat/correctness-metric branch from fa08c91 to e1362ea Compare December 15, 2025 19:21

jb3rndt added 3 commits February 2, 2026 12:20

Add simulated precision to timeliness metric

40701f9

Merge remote-tracking branch 'origin/main' into feat/correctness-metric

e6dff29

Update DQResult usages to newer interface

ad591fc

jb3rndt requested a review from lisehr February 2, 2026 15:47

lisehr requested changes Feb 9, 2026

View reviewed changes

lisehr reviewed Feb 9, 2026

View reviewed changes

jb3rndt added 2 commits February 10, 2026 22:56

Rename and merge completeness metrics

742f8fa

Store values as floats instead of numpy floats

7205bd1

lisehr assigned Tomic-Riedel Feb 25, 2026

Noel-Bastubbe mentioned this pull request Feb 26, 2026

Feat/unify metric config #10

Open

Allow completeness assessment on cell granularity

68e9684

jb3rndt force-pushed the feat/correctness-metric branch from b23f55b to 68e9684 Compare March 4, 2026 12:52

jb3rndt added 2 commits March 6, 2026 16:41

Merge remote-tracking branch 'origin/main' into feat/correctness-metric

2152441

Create central Database class to use in different modules

4030b20

jb3rndt force-pushed the feat/correctness-metric branch from 226eb95 to e503d5b Compare March 6, 2026 16:00

jb3rndt added 2 commits March 6, 2026 17:19

Add enum for data quality assessment granularity

6da26eb

Remove unused config parameter from demo

15cbb0b

jb3rndt force-pushed the feat/correctness-metric branch from e503d5b to 15cbb0b Compare March 6, 2026 16:27

Allow column-specific timeliness configuration

ebb932b

jb3rndt requested a review from lisehr March 22, 2026 11:17

lisehr previously approved these changes Mar 22, 2026

View reviewed changes

Fix result creation index handling

d3798d8

jb3rndt dismissed lisehr’s stale review via d3798d8 March 23, 2026 10:47

jb3rndt requested a review from lisehr March 23, 2026 11:02

lisehr approved these changes Mar 23, 2026

View reviewed changes

jb3rndt merged commit 9b2f240 into main Mar 23, 2026

Conversation

jb3rndt commented Oct 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lisehr left a comment

Choose a reason for hiding this comment

Uh oh!

jb3rndt commented Dec 12, 2025

Uh oh!

jb3rndt commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Metrics

Framework Changes

Uh oh!

lisehr left a comment

Choose a reason for hiding this comment

Uh oh!

lisehr left a comment

Choose a reason for hiding this comment

Uh oh!

Tomic-Riedel commented Mar 5, 2026

Uh oh!

jb3rndt commented Mar 6, 2026

Uh oh!

Tomic-Riedel commented Mar 20, 2026

Uh oh!

lisehr left a comment

Choose a reason for hiding this comment

Uh oh!

lisehr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jb3rndt commented Feb 2, 2026 •

edited

Loading