evals/openalex: addrun_evals.sh— runs prompts viaclaude -p, saves outputs toruns/YYYY-MM-DD/, auto-checks 4 pitfall patterns (relevance_sort, title.search scope, multi-line curl, API key printed)openalex: never print API key to verify presence — use[[ -n "${OPENALEX_API_KEY:-}" ]]; added to Quick Start and Common Pitfalls in SKILL.mdevals/openalex: add 20 synthetic test prompts (test-09–test-28) targeting 11 known pitfall zones (relevance_sort_no_search, title_search_standalone, entity_name_filter, sequential_batch, nested_select, missing_api_key, default_per_page, multi_line_curl, null_csv_field, pdf_fallback_chain, wrong_search_scope); add corresponding pitfall checks tochecks.md
-
open-data-quality: ZIP-wrapped CSV support — if resource declared as CSV is actually a ZIP, extract largest CSV to /tmp and proceed with full analysis; report MINORzip_wrapped_csvinstead of BLOCKER; uses Pythonzipfile(no DuckDB/Polars ZIP support); score 40→86 on Liguria air quality dataset -
open-data-quality: fix_extras_value— also check top-level package fields; some harvesters (dati.gov.it from regional portals) promote holder_name/identifier to top-level instead of extras; fixes false-positive MAJORs; score 83→97 on test dataset -
open-data-quality: add qualitative assessment section to SKILL.md — 9 LLM-only checks (title discoverability, title↔description, description↔content, content↔update frequency, dataset usefulness); runs after scripts, requires data content; Good/Acceptable/Poor rating; added to report template -
open-data-quality: addoutlier_valuescheck (phase3_content) — IQR method on numeric columns (≥100 rows); severity MINOR, -2 pts; no fix suggestion (signal only); new fixtureoutlier_values.csv; 36/36 tests pass -
open-data-quality: addduplicate_rowscheck (phase3_content) — detects exact duplicate rows via DuckDBSELECT DISTINCT *; severity MAJOR, -3 pts on data content quality; new fixtureduplicate_rows.csv; 35/35 tests pass
-
open-data-quality: fix IT holder label —dcatapit:datasetHolder→dct:rightsHolderin code and all reference docs (confirmed from real dati.gov.it data) -
open-data-quality: remove deprecateddcatapit:datasetHolderrow from profiles table;dct:rightsHoldernow marked M for IT -
open-data-quality: addportal_field_aliases.json— JSON vocabulary mapping standard DCAT-AP field names to portal-specific CKAN extras keys; UK profile mapsissued/modified→dcat_issued/dcat_modified -
open-data-quality: fix UK date detection —metadata_validator.pynow usesFIELD_ALIASESfallback for date fields per profile -
open-data-quality: add 3 new tests —test_it_holder_present_ok,test_it_holder_missing_flagged,test_uk_dcat_prefixed_dates_accepted; 34/34 pass -
open-data-quality: added pytest test suite — 25 tests across phase0–3 + CLI integration; fixtures inscripts/tests/fixtures/;pytestadded as dev dependency inpyproject.toml -
open-data-quality: fix fuzzy check — skip datetime/timestamp columns stored as VARCHAR (e.g.2025-03-14T00:00:00was triggering false positive near-duplicate alert) -
open-data-quality: file type detection in phase 0 — detect ZIP, HTML/XML, JSON, PDF, OLE2/Excel, UTF-16 via magic bytes/content sniffing; report specific type (e.g. "File is a ZIP archive") instead of generic binary/separator error -
open-data-quality: fuzzy false positive fix — addlevenshtein/max_len < 10%ratio filter + raise JW threshold to 0.95 + minimum length > 5; eliminates NORD-ESTNORD-OVEST, MINISTERO DELLA DIFESASALUTE type false positives while preserving real typos (D'INTERESSE~DI INTERESSE caught at 4% ratio) -
open-data-quality: fix encoding false positive — normalizeutf_8→utf-8before comparison; was marking valid UTF-8 files as MAJOR issue -
open-data-quality: split fuzzy check — trailing/leading whitespace now reported separately; fuzzy comparison works on trimmed values to avoid spurious matches -
open-data-quality: fix #11 — placeholder message now shows actual values found (e.g.NA) instead of full catalogn/a, n.d., -…; SQL useslist_distinct(list(...))to collect found values per column -
open-data-quality: non-UTF8 encoding no longer a BLOCKER —charset_normalizerdetects encoding, file converted to UTF-8 temp copy, full analysis runs; MAJOR finding reported (tested on Comune di Palermo CP1250 dataset: 33→73/100) -
open-data-quality: added fuzzy near-duplicate category check viajaro_winkler_similarity > 0.92(DuckDB built-in) — found real issues in Palermo dataset -
open-data-quality: CRLF line endings no longer flagged (RFC 4180 prescribes CRLF) -
open-data-quality: replacedchardetwithcharset_normalizerfor more accurate encoding detection -
open-data-quality: added developer notes toCONTRIBUTING.md(WSL/uvx cache, PYTHONUTF8, DuckDB lenient parsing) -
evals/open-data-quality/fixtures/palermo-edifici-pubblici-cp1250.csv: archived as encoding test fixture -
open-data-quality: fixed false BLOCKER on valid CSVs with quoted newlines in headers — retry withstrict_mode=false; added_lenientflag +_rcsv()helper across all DuckDB queries (score 29→86 on Copertino dataset) -
open-data-quality: SKILL.md rewritten —uvx odq-csv/odq-ckanis now the single primary path; all bash inline phases removed (no double maintenance) -
SKILL.md reduced from ~450 to ~130 lines
-
evals/checks.mdupdated to validateuvx+ package usage
- Added
open-data-qualityskill: CSV validator (odq-csv) and CKAN/DCAT-AP metadata validator (odq-ckan) - Created missing
__init__.pyfor package discovery - Translated
scripts/README.mdfrom Italian to English - Tested end-to-end on real dati.gov.it datasets (bilancio, popolazione)
- Created eval suite:
evals/open-data-quality/(8 prompts, 15 checks)
openalexSKILL.md v0.2: added filter syntax (OR/NOT/ranges), batch pipe lookup, two-step entity lookup,group_by/sample/seedparams,per-page=200default, error handling with backoff, endpoint costs table
- Saved reference documents in
docs/: testing-agent-skills-with-evals.md and agent-skills-specification.md - Added "Core Idea" section to
PRD.md(skill creation as an inclusive, non-technical activity) - Created
docs/prd.mdwith Alessio Cimarelli's eval proposal - Created
evals/structure with_template/and first eval foropenalex - First complete eval run on
openalex: score 78/100 (7/9 checks), 7 improvements to the skill- single-line curl required (warning at top of SKILL.md)
title.searchinsidefilter=, not standalonedisplay_nameas title (Output Format section)api_keyexplicitly documented as pitfall- Europe PMC fallback documented as recipe #6
- Definition of Done added to SKILL.md
- Updated
README.md: catalog with Eval column, Evals section, fix skill data-quality-csv → openalex - Created project
CLAUDE.md