-
Notifications
You must be signed in to change notification settings - Fork 0
NDF Escaping and Edge Cases
This document specifies how to handle special characters and edge cases in NDF without turning it into "YAML: the sequel."
Principle: Keep escaping simple and predictable. Use quoting only when necessary.
Goal: Avoid complex escape sequences. Prefer explicit quoting over implicit escaping.
A value must be quoted if it contains:
-
Structural characters:
:,,,#,[,],{,} -
Quotes:
",'(to avoid ambiguity) - Leading/trailing whitespace
-
Reserved keywords:
yes,no,true,false,null,none,- -
Leading special prefixes:
$(unless it's a reference),@(unless it's a type hint) - Numeric strings: Values that look like numbers but should be strings
Use double quotes (") for values that need quoting.
# Colon in value
time: "10:30:00"
url: "https://example.com:8080"
# Comma in value
message: "Hello, world"
# Hash in value
color: "#ff0000"
# Reserved keyword
status: "yes" # String, not boolean
Single quotes (') work the same but are less common. Prefer double quotes for consistency.
message: 'Hello, world'
Inside quoted strings, escape these characters:
-
\n→ newline -
\t→ tab -
\r→ carriage return -
\\→ backslash -
\"→ double quote (when using") -
\'→ single quote (when using') -
\:→ colon (optional, colon is safe inside quotes) -
\,→ comma (optional, comma is safe inside quotes)
Note: : and , don't need escaping inside quotes, but escaping is supported for clarity.
# Escaped newline
message: "Line 1\nLine 2"
# Escaped quote
quote: "He said \"Hello\""
# Escaped backslash
path: "C:\\Users\\Name"
Problem: Colon is the key-value separator.
Solution: Quote the value.
# ❌ Invalid (colon interpreted as separator)
time: 10:30:00
# ✅ Valid (quoted)
time: "10:30:00"
# ✅ Valid (multiline)
time: |
10:30:00
Problem: Comma is used for list separation.
Solution: Quote the value.
# ❌ Invalid (comma splits into list)
greeting: Hello, world
# ✅ Valid (quoted)
greeting: "Hello, world"
# ✅ Valid (multiline)
greeting: |
Hello, world
Problem: How to include a comma in a list item?
Solution: Quote the item.
# List with comma in item
tags: "tag, with comma", "another tag", "third tag"
# Or use multiline array
tags:
- "tag, with comma"
- "another tag"
- "third tag"
Problem: Hash starts a comment.
Solution: Quote the value.
# ❌ Invalid (hash starts comment)
color: #ff0000
# ✅ Valid (quoted)
color: "#ff0000"
Problem: Brackets/braces indicate arrays/objects.
Solution: Quote the value.
# ❌ Invalid (interpreted as array)
pattern: [a-z]
# ✅ Valid (quoted)
pattern: "[a-z]"
# ❌ Invalid (interpreted as object)
template: {name}
# ✅ Valid (quoted)
template: "{name}"
Problem: Quotes delimit strings.
Solution: Escape inside quotes, or use opposite quote type.
# Escaped double quote
message: "He said \"Hello\""
# Or use single quotes
message: 'He said "Hello"'
# Escaped single quote
message: 'It\'s great'
# Or use double quotes
message: "It's great"
Problem: Whitespace is significant for indentation.
Solution: Quote the value.
# ❌ Invalid (whitespace trimmed)
name: Alice
# ✅ Valid (quoted preserves whitespace)
name: " Alice "
Problem: Empty value is interpreted as null.
Solution: Quote empty string.
# ❌ Invalid (interpreted as null)
empty:
# ✅ Valid (explicit empty string)
empty: ""
Problem: Keywords have special meaning.
Solution: Quote to use as string.
# Boolean keywords
status: "yes" # String, not boolean
enabled: "true" # String, not boolean
# Null keywords
value: "null" # String, not null
empty: "none" # String, not null
placeholder: "-" # String, not null
Problem: Numbers are parsed as numbers, not strings.
Solution: Quote to preserve as string.
# ❌ Invalid (parsed as number)
zipcode: 01234
# ✅ Valid (quoted as string)
zipcode: "01234"
# ❌ Invalid (parsed as number)
id: 00123
# ✅ Valid (quoted as string)
id: "00123"
Problem: $ indicates references, @ indicates type hints.
Solution: Quote to use literally.
# ❌ Invalid (interpreted as reference)
price: $100
# ✅ Valid (quoted)
price: "$100"
# ❌ Invalid (interpreted as type hint)
tag: @important
# ✅ Valid (quoted)
tag: "@important"
Problem: Newlines end key-value pairs.
Solution: Use multiline string (|) or escape (\n).
# Multiline (preferred)
description: |
Line 1
Line 2
Line 3
# Escaped (alternative)
description: "Line 1\nLine 2\nLine 3"
Problem: Keys can contain most characters, but some are problematic.
Solution: Quote keys with special characters.
# Valid unquoted key
name: Alice
# Valid quoted key (if needed)
"key:with:colons": value
"key,with,commas": value
"key with spaces": value
Note: Keys with colons/commas are discouraged. Prefer using valid identifiers.
Problem: Quotes inside quoted strings.
Solution: Escape or use opposite quote type.
# Escaped
message: "He said \"Hello\" and she said \"Hi\""
# Mixed quotes
message: "He said 'Hello' and she said 'Hi'"
Problem: Backslash is escape character.
Solution: Always escape backslashes.
# Escaped backslash
path: "C:\\Users\\Name"
# Backslash at end
text: "Ends with backslash\\"
The parser uses needsQuoting() to determine if a value should be quoted:
function needsQuoting(str: string): boolean {
// Empty string
if (str === '') return true;
// Leading/trailing whitespace
if (str !== str.trim()) return true;
// Special characters
if (/[:#\[\]{},"']/.test(str)) return true;
// Reserved keywords
if (/^(yes|no|true|false|null|none|-)$/i.test(str)) return true;
// Numbers (to preserve as string)
if (/^-?\d+(\.\d+)?([eE][+-]?\d+)?$/.test(str)) {
// Only quote if it looks like it should be a string
// (e.g., leading zeros, but this is context-dependent)
return false; // Numbers are usually not quoted
}
// Leading $ or @
if (str.startsWith('$') || str.startsWith('@')) return true;
return false;
}When serializing, use quoteIfNeeded():
function quoteIfNeeded(str: string): string {
if (needsQuoting(str)) {
const escaped = escapeString(str);
return `"${escaped.replace(/"/g, '\\"')}"`;
}
return str;
}# URL with port (contains colon)
url: "https://example.com:8080"
# URL with query (contains ampersand, etc.)
api: "https://api.example.com/v1?key=value&format=json"
# Windows path (contains backslashes)
path: "C:\\Users\\Name\\Documents"
# Unix path (usually no quoting needed)
path: /home/user/documents
# JSON string (contains quotes and braces)
json: "{\"key\": \"value\"}"
# Or use multiline
json: |
{"key": "value"}
# Regex pattern (contains brackets, etc.)
pattern: "[a-z]+"
# Code with special characters
code: |
function test() {
return "hello: world";
}
To keep NDF simple, we:
-
Limit escape sequences: Only
\n,\t,\r,\\,\",\' - No complex escaping: No Unicode escapes, no octal, no hex
- Explicit over implicit: Prefer quoting over complex rules
- Predictable behavior: Same input always produces same output
- Clear error messages: When escaping fails, show why
Test these scenarios:
# All special characters
special: ":,[]{}#\"'$@"
# Empty and whitespace
empty: ""
spaces: " hello "
# Reserved words as strings
keywords: "yes no true false null none -"
# Numbers as strings
numeric: "01234"
# Mixed quotes
mixed: "He said 'Hello'"
# Escaped sequences
escaped: "Line 1\nLine 2\tTabbed"
Rule of thumb: If a value contains any character that could be interpreted as syntax, quote it.
Keep it simple: Use double quotes, escape only what's necessary, prefer multiline for complex values.
Avoid complexity: Don't add YAML-like features (anchors, aliases, complex escaping). Keep NDF focused on simplicity.