All notable changes to this project will be documented in this file.
run cross_yearnow writes dedicated metadata, manifest and validation artifacts for multi-year outputs.runandvalidatenow support--yearsfor scoped multi-year execution.- MART validation now reports
table_rulesentries that do not match declared tables.
pyarrowis no longer a direct dependency. All Parquet I/O is handled natively by DuckDB. Users who needpyarrowdirectly can install theparquetoptional extra (pip install dataciviclab-toolkit[parquet]).- Documentation now classifies
run cross_yearandinspect schema-diffas supported advanced tooling in the feature stability matrix. - Changelog/docs references to config warning codes now reflect the current implemented range through
DCL013.
- Legacy config forms below no longer emit deprecation warnings and now fail with explicit config errors:
bqraw.sourceraw.sources[].pluginraw.sources[].id- scalar
clean.read clean.read.csv.*clean.sql_pathmart.sql_dir
- Runtime boundaries documentation clarifying core, advanced and compatibility-only toolkit surfaces.
- RAW profile hints in metadata for lightweight diagnostics during normal RAW runs.
- Pytest markers and a more explicit split between fast tests and heavier smoke-like checks.
- Reduced the runtime surface area by removing peripheral experimental helpers and non-core shims.
- Refined CLEAN input selection, DuckDB read flow and orchestration to make the RAW -> CLEAN bridge more predictable.
- Refreshed smoke and profiling documentation around the supported operational workflow.
- Clarified manifest and metadata writing so runtime artifacts better reflect actual layer outputs.
- Deprecated core import shims that no longer belonged to the stable runtime contract.
- Frozen helper surfaces such as
gen-sqland peripheral experimental plugins. - Obsolete validator/helper modules that duplicated the current runtime path.
- Typed configuration models with Pydantic v2 for
dataset.yml. - End-to-end smoke tests for tiny CSV and local ZIP extraction flows.
- Install and CLI smoke script for clean-environment verification.
- Configuration schema documentation with minimal and full examples.
- Centralized config deprecation policy with
DCL001toDCL013warning codes. --strict-configCLI option andconfig.strictconfig switch.- Explicit built-in plugin registry with strict/non-strict handling for optional plugins.
- Coverage reporting in CI with XML artifact upload and fail-under threshold.
- Release changelog.
load_config()now parses through typed config models while preserving the current consumer API.- Validation specs for CLEAN and MART now rely on typed rule structures instead of ad hoc runtime coercion.
- CI now runs as an OS and Python matrix for Ubuntu and Windows on Python 3.10 and 3.11.
- CI now publishes
coverage.xmlartifacts and enforces minimum package coverage. - Packaging version is now sourced from toolkit/version.py.
- Boolean-like config values such as
"false"and"0"no longer evaluate incorrectly as truthy. - List-like validation fields no longer degrade into character-by-character lists when given as strings.
- CLEAN and MART validation runners no longer attempt to validate unrelated config keys against strict validation specs.
- CLI strict-config handling no longer misinterprets Typer option metadata as enabled strict mode.
- DuckDB connections in CLEAN and MART are always closed, avoiding Windows file-lock issues on produced parquet files.
resumenow verifies previous-layer artifacts before resuming and supports explicit restart from a chosen layer.- Documentation and canonical examples no longer rely on deprecated
raw.source.
raw.sourcein favor ofraw.sourcesraw.sources[].pluginin favor ofraw.sources[].typeraw.sources[].idin favor ofraw.sources[].name- scalar
clean.readin favor ofclean.read.source clean.read.csv.*in favor ofclean.read.*