-
Notifications
You must be signed in to change notification settings - Fork 0
NDF Canonicalization Rules
Canonicalization ensures that semantically equivalent NDF documents serialize to identical byte sequences. This is critical for:
- Deterministic output
- Round-trip preservation
- Diff tools and version control
- Hash-based comparisons
- Determinism: Same data structure always produces same output
- Minimalism: Prefer most compact representation
- Readability: When compactness conflicts, prefer readability
- Consistency: Apply rules uniformly across all values
Default: Preserve insertion order (as in JavaScript objects)
Canonical mode (sortKeys: true): Sort keys alphabetically (case-sensitive)
# Original (insertion order)
zebra: 1
apple: 2
banana: 3
# Canonical (sorted)
apple: 2
banana: 3
zebra: 1
Standard: 2 spaces per level
Canonical: Always use 2 spaces (never tabs, never mixed)
# Non-canonical (tabs)
user:
name: Alice
# Canonical (spaces)
user:
name: Alice
Canonical: Use yes/no (not true/false)
# Non-canonical
enabled: true
disabled: false
# Canonical
enabled: yes
disabled: no
Canonical: Use null (not none or -)
# Non-canonical
optional: none
empty: -
# Canonical
optional: null
empty: null
Rule: Quote only when necessary (see ESCAPING.md for details)
Canonical decisions:
- Don't quote if value is unquoted-safe
- Use double quotes (
") when quoting is needed - Escape only required characters
# Non-canonical (over-quoted)
name: "Alice"
age: "30"
# Canonical (minimal quoting)
name: Alice
age: 30
Rule: Use most compact form that fits within inlineThreshold (default: 60 chars)
Precedence:
- Inline comma-separated:
tags: a, b, c - Inline bracketed:
tags: [a, b, c] - Multiline with dashes:
tags:\n - a\n - b\n - c
# Short array - inline comma
tags: python, ai, ml
# Medium array - inline bracket
coordinates: [10.5, 20.3, 30.1, 40.2]
# Long array - multiline
long_list:
- item1
- item2
- item3
- item4
- item5
Canonical threshold: If total length ≤ 60 chars, use inline. Otherwise, multiline.
Rule: Prefer nested blocks over inline objects
Exception: Very small objects (1-2 keys, total < 40 chars) can be inline
# Small object - inline OK
meta: {version: "1.0", author: "John"}
# Larger object - nested
user:
name: Alice
email: alice@example.com
settings:
theme: dark
notifications: yes
Rule: Use multiline (|) when string contains newlines
Indentation: Content indented 2 spaces relative to key
Trailing newlines: Strip trailing empty lines
# Canonical multiline
description: |
Line 1
Line 2
Line 3
# Not canonical (escaped newlines)
description: "Line 1\nLine 2\nLine 3"
Rule: Preserve original format (integer vs float)
Canonical decisions:
-
30not30.0 -
3.14not3.140 - Scientific notation only when necessary:
1e10not10000000000
# Canonical
count: 30
pi: 3.14
large: 1e10
# Non-canonical
count: 30.0
pi: 3.140
large: 10000000000
Rule: Preserve reference definitions and usages
Canonical mode (includeReferences: false): Omit reference definitions, resolve all usages
# With references
$base: https://api.example.com
endpoint: $base/v1
# Canonical (resolved)
endpoint: https://api.example.com/v1
Rule: Preserve type hints if supported, otherwise strip
Canonical: Include type hints in output if parser supports them
# With type hint
timestamp: @time 2024-01-15T10:30:00Z
# Without type hint support
timestamp: 2024-01-15T10:30:00Z
Rule:
- No trailing whitespace on lines
- Single newline between top-level entries
- No blank lines at start/end of document
# Canonical
key1: value1
key2: value2
key3: value3
# Non-canonical
key1: value1
key2: value2
key3: value3
Rule: Comments are not preserved in canonical form (they're metadata, not data)
Exception: If preserveComments option is enabled, preserve comments with their original formatting
# Original
name: Alice # User's name
age: 30
# Canonical (comments stripped)
name: Alice
age: 30
✅ Preserved:
- Key-value pairs
- Nested structure
- Array order
- String content (including newlines)
- Number precision
- Boolean values
- Null values
- Key ordering (unless
sortKeys: true) - Boolean representation (
true→yes) - Null representation (
none→null) - Array formatting (inline vs multiline)
- String quoting (if unnecessary)
- Whitespace normalization
- Comments (stripped by default)
A document is round-trip safe if:
const original = parser.parse(text);
const serialized = parser.dumps(original);
const reparsed = parser.parse(serialized);
assert(deepEqual(original, reparsed));Note: text !== serialized is expected and OK, as long as original === reparsed semantically.
{
indent: ' ',
indentLevel: 0,
inlineThreshold: 60,
sortKeys: false,
includeReferences: true
}{
indent: ' ',
indentLevel: 0,
inlineThreshold: 60,
sortKeys: true, // Alphabetical key order
includeReferences: false // Resolve all references
}Input:
tags: python ai ml
Canonical output:
tags: python, ai, ml
Reason: Comma-separated is more explicit and handles edge cases better.
Input:
enabled: true
disabled: false
Canonical output:
enabled: yes
disabled: no
Reason: yes/no is the preferred NDF boolean format.
Input:
zebra: 1
apple: 2
banana: 3
Canonical output (with sortKeys: true):
apple: 2
banana: 3
zebra: 1
Input:
$base: https://api.example.com
endpoint: $base/v1
Canonical output (with includeReferences: false):
endpoint: https://api.example.com/v1
-
Use canonical mode for:
- Version control
- Automated tooling
- Hash-based comparisons
- Testing
-
Use default mode for:
- Human editing
- Preserving user formatting
- Development workflows
-
Always test round-trip when implementing serialization changes
-
Document any canonicalization choices that affect user-visible behavior