feat: arricchire run record con metriche di esecuzione by matteocavo · Pull Request #62 · dataciviclab/toolkit

matteocavo · 2026-03-21T10:21:02Z

Closes #50

Cosa cambia

toolkit/core/run_context.py

duration_seconds aggiunto a ogni layer e al run totale — derivato dai timestamp già presenti in to_dict(), zero costo aggiuntivo
metrics aggiunto a ogni layer con campi output_rows, output_bytes, tables_count (default null)
nuovo metodo set_layer_metrics() su RunContext

toolkit/cli/cmd_run.py

_execute_layer aggancia il valore di ritorno dei runner e chiama context.set_layer_metrics() se il runner restituisce un dict

toolkit/clean/run.py

_run_sql restituisce anche il row count (SELECT count(*) FROM clean_out prima di chiudere la connessione DuckDB)
run_clean restituisce {"output_rows": N, "output_bytes": M}

toolkit/mart/run.py

accumula total_rows per tabella nel loop (SELECT count(*) FROM {name} a connessione aperta)
run_mart restituisce {"output_rows": total_rows, "output_bytes": total_bytes, "tables_count": len(written)}
output_rows è la somma delle righe delle tabelle scritte, non per-tabella

toolkit/raw/run.py

run_raw restituisce {"output_bytes": sum(files_written[].bytes)}
output_rows non viene popolato su raw: file grezzi, non righe strutturate

Tradeoff espliciti

Metrica	Decisione
`duration_seconds`	derivato da timestamp esistenti, nessun orologio aggiuntivo
`output_rows` su raw	escluso — raw non produce righe strutturate
`input_rows`	escluso — richiede lettura parquet aggiuntiva, fuori scope
`sql_size_bytes`	escluso — poco utile in diagnostica, fuori scope

Retrocompatibilità

i run record esistenti senza metrics continuano a caricarsi correttamente
cmd_status e cmd_resume non toccati — leggono i campi con .get() e sono già defensivi
tutti i field nuovi sono additive-only

Verifica

pytest: 190 passed
ruff: pulito

- aggiunge duration_seconds per layer e run totale, derivato dai timestamp esistenti in to_dict() senza costo aggiuntivo - aggiunge metrics per layer: output_rows (clean/mart), output_bytes (tutti), tables_count (mart) - output_rows su mart definito come somma delle righe per tabella - metriche additive-only: nessun cambio di contratto per consumer esistenti - aggiunge set_layer_metrics() su RunContext e aggancia i runner via cmd_run.py - test mirati: default null, persist, round-trip, duration, somma mart Closes #50

Gabrymi93

Perfetto! Grazie!

matteocavo added the enhancement New feature or request label Mar 21, 2026

matteocavo self-assigned this Mar 21, 2026

Gabrymi93 enabled auto-merge (squash) March 21, 2026 10:27

Gabrymi93 approved these changes Mar 21, 2026

View reviewed changes

Gabrymi93 merged commit 449bc64 into main Mar 21, 2026
5 checks passed

Gabrymi93 deleted the feat/run-record-metrics branch March 21, 2026 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: arricchire run record con metriche di esecuzione#62

feat: arricchire run record con metriche di esecuzione#62
Gabrymi93 merged 1 commit intomainfrom
feat/run-record-metrics

matteocavo commented Mar 21, 2026

Uh oh!

Gabrymi93 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matteocavo commented Mar 21, 2026

Cosa cambia

Tradeoff espliciti

Retrocompatibilità

Verifica

Uh oh!

Gabrymi93 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants