NDF Canonicalization Rules

Canonicalization ensures that semantically equivalent NDF documents serialize to identical byte sequences. This is critical for:

Deterministic output
Round-trip preservation
Diff tools and version control
Hash-based comparisons

Core Principles

Determinism: Same data structure always produces same output
Minimalism: Prefer most compact representation
Readability: When compactness conflicts, prefer readability
Consistency: Apply rules uniformly across all values

Serialization Rules

Key Ordering

Default: Preserve insertion order (as in JavaScript objects)

Canonical mode (sortKeys: true): Sort keys alphabetically (case-sensitive)

# Original (insertion order)
zebra: 1
apple: 2
banana: 3

# Canonical (sorted)
apple: 2
banana: 3
zebra: 1

Indentation

Standard: 2 spaces per level

Canonical: Always use 2 spaces (never tabs, never mixed)

# Non-canonical (tabs)
user:
		name: Alice

# Canonical (spaces)
user:
  name: Alice

Boolean Values

Canonical: Use yes/no (not true/false)

# Non-canonical
enabled: true
disabled: false

# Canonical
enabled: yes
disabled: no

Null Values

Canonical: Use null (not none or -)

# Non-canonical
optional: none
empty: -

# Canonical
optional: null
empty: null

String Quoting

Rule: Quote only when necessary (see ESCAPING.md for details)

Canonical decisions:

Don't quote if value is unquoted-safe
Use double quotes (") when quoting is needed
Escape only required characters

# Non-canonical (over-quoted)
name: "Alice"
age: "30"

# Canonical (minimal quoting)
name: Alice
age: 30

Array Representation

Rule: Use most compact form that fits within inlineThreshold (default: 60 chars)

Precedence:

Inline comma-separated: tags: a, b, c
Inline bracketed: tags: [a, b, c]
Multiline with dashes: tags:\n - a\n - b\n - c

# Short array - inline comma
tags: python, ai, ml

# Medium array - inline bracket
coordinates: [10.5, 20.3, 30.1, 40.2]

# Long array - multiline
long_list:
  - item1
  - item2
  - item3
  - item4
  - item5

Canonical threshold: If total length ≤ 60 chars, use inline. Otherwise, multiline.

Object Representation

Rule: Prefer nested blocks over inline objects

Exception: Very small objects (1-2 keys, total < 40 chars) can be inline

# Small object - inline OK
meta: {version: "1.0", author: "John"}

# Larger object - nested
user:
  name: Alice
  email: alice@example.com
  settings:
    theme: dark
    notifications: yes

Multiline Strings

Rule: Use multiline (|) when string contains newlines

Indentation: Content indented 2 spaces relative to key

Trailing newlines: Strip trailing empty lines

# Canonical multiline
description: |
  Line 1
  Line 2
  Line 3

# Not canonical (escaped newlines)
description: "Line 1\nLine 2\nLine 3"

Numbers

Rule: Preserve original format (integer vs float)

Canonical decisions:

30 not 30.0
3.14 not 3.140
Scientific notation only when necessary: 1e10 not 10000000000

# Canonical
count: 30
pi: 3.14
large: 1e10

# Non-canonical
count: 30.0
pi: 3.140
large: 10000000000

References

Rule: Preserve reference definitions and usages

Canonical mode (includeReferences: false): Omit reference definitions, resolve all usages

# With references
$base: https://api.example.com
endpoint: $base/v1

# Canonical (resolved)
endpoint: https://api.example.com/v1

Type Hints

Rule: Preserve type hints if supported, otherwise strip

Canonical: Include type hints in output if parser supports them

# With type hint
timestamp: @time 2024-01-15T10:30:00Z

# Without type hint support
timestamp: 2024-01-15T10:30:00Z

Whitespace

Rule:

No trailing whitespace on lines
Single newline between top-level entries
No blank lines at start/end of document

# Canonical
key1: value1
key2: value2
key3: value3

# Non-canonical
key1: value1

key2: value2

key3: value3

Comments

Rule: Comments are not preserved in canonical form (they're metadata, not data)

Exception: If preserveComments option is enabled, preserve comments with their original formatting

# Original
name: Alice  # User's name
age: 30

# Canonical (comments stripped)
name: Alice
age: 30

Round-Trip Guarantees

What Preserves

✅ Preserved:

Key-value pairs
Nested structure
Array order
String content (including newlines)
Number precision
Boolean values
Null values

What May Change

⚠️ May change (but semantically equivalent):

Key ordering (unless sortKeys: true)
Boolean representation (true → yes)
Null representation (none → null)
Array formatting (inline vs multiline)
String quoting (if unnecessary)
Whitespace normalization
Comments (stripped by default)

Round-Trip Test

A document is round-trip safe if:

const original = parser.parse(text);
const serialized = parser.dumps(original);
const reparsed = parser.parse(serialized);
assert(deepEqual(original, reparsed));

Note: text !== serialized is expected and OK, as long as original === reparsed semantically.

Implementation

Default Options (Non-Canonical)

{
  indent: '  ',
  indentLevel: 0,
  inlineThreshold: 60,
  sortKeys: false,
  includeReferences: true
}

Canonical Options

{
  indent: '  ',
  indentLevel: 0,
  inlineThreshold: 60,
  sortKeys: true,        // Alphabetical key order
  includeReferences: false  // Resolve all references
}

Examples

Example 1: Array Formatting

Input:

tags: python ai ml

Canonical output:

tags: python, ai, ml

Reason: Comma-separated is more explicit and handles edge cases better.

Example 2: Boolean Normalization

Input:

enabled: true
disabled: false

Canonical output:

enabled: yes
disabled: no

Reason: yes/no is the preferred NDF boolean format.

Example 3: Key Ordering

Input:

zebra: 1
apple: 2
banana: 3

Canonical output (with sortKeys: true):

apple: 2
banana: 3
zebra: 1

Example 4: Reference Resolution

Input:

$base: https://api.example.com
endpoint: $base/v1

Canonical output (with includeReferences: false):

endpoint: https://api.example.com/v1

Best Practices

Use canonical mode for:
- Version control
- Automated tooling
- Hash-based comparisons
- Testing
Use default mode for:
- Human editing
- Preserving user formatting
- Development workflows
Always test round-trip when implementing serialization changes
Document any canonicalization choices that affect user-visible behavior

Community

Legal

License - MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NDF Canonicalization Rules

NDF Canonicalization Rules

Core Principles

Serialization Rules

Key Ordering

Indentation

Boolean Values

Null Values

String Quoting

Array Representation

Object Representation

Multiline Strings

Numbers

References

Type Hints

Whitespace

Comments

Round-Trip Guarantees

What Preserves

What May Change

Round-Trip Test

Implementation

Default Options (Non-Canonical)

Canonical Options

Examples

Example 1: Array Formatting

Example 2: Boolean Normalization

Example 3: Key Ordering

Example 4: Reference Resolution

Best Practices

Community

Legal

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally