Skip to content

Address spec gaps identified from cdx-pandoc analysis#15

Merged
gvonness-apolitical merged 2 commits intomainfrom
fix/spec-gaps-from-cdx-pandoc-v2
Feb 1, 2026
Merged

Address spec gaps identified from cdx-pandoc analysis#15
gvonness-apolitical merged 2 commits intomainfrom
fix/spec-gaps-from-cdx-pandoc-v2

Conversation

@gvonness-apolitical
Copy link
Collaborator

Summary

This PR addresses specification gaps identified by analyzing the cdx-pandoc implementation against the current codex-file-format-spec. The changes ensure better alignment between the spec and real-world implementations.

Priority 1: High Impact

  • Create schemas/semantic.schema.json - New JSON schema for validating semantic extension blocks and marks (citation, footnote, entity, glossary, bibliography, term, ref, measurement)
  • Extend citation mark - Add CSL-compatible fields: prefix, suffix, locator, suppressAuthor
  • Document bibliography block entries field - Support inline CSL JSON entries with renderedText for citeproc output

Priority 2: Medium Impact

  • Add entity mark source field - Indicate knowledge graph origin (e.g., "wikidata", "dbpedia")
  • Add creators field to Dublin Core - Structured author data with ORCID, affiliation, and email support

Priority 3: Low Impact

  • Clarify linebreak semantics - Document soft breaks (\n in text) vs hard breaks (break block)
  • Clarify tableCell children - Document simplified form allowing text nodes directly
  • Document measurement type relationship - Clarify core measurement (metrology) vs semantic:measurement (linked data)

Test plan

  • All schemas compile with ajv compile --spec=draft2020
  • Example documents validate against updated schemas
  • CI workflow updated to include new semantic.schema.json

- Add semantic.schema.json for validating semantic extension blocks/marks
- Extend citation mark with CSL-compatible fields (prefix, suffix, locator, suppressAuthor)
- Add source field to entity mark for knowledge graph origin
- Add creators field to Dublin Core for structured author data with ORCID
- Document bibliography block inline entries with renderedText support
- Clarify linebreak semantics (soft breaks vs hard breaks)
- Document tableCell simplified form for text node children
- Document core vs semantic measurement type relationship
- Update CI workflow to validate semantic schema
- Add package.json with npm test for schema and example validation
- Add scripts/validate-schemas.ts to compile all JSON schemas
- Add scripts/validate-examples.ts to validate example documents
- Simplify CI workflow to use npm test
- Use Ajv 2020-12 draft for JSON Schema validation
@gvonness-apolitical gvonness-apolitical merged commit aaa3a1f into main Feb 1, 2026
2 checks passed
@gvonness-apolitical gvonness-apolitical deleted the fix/spec-gaps-from-cdx-pandoc-v2 branch February 1, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant