Skip to content

Improve W3C serialization compliance across all output methods#6138

Open
joewiz wants to merge 62 commits intoeXist-db:developfrom
joewiz:feature/serialization-compliance
Open

Improve W3C serialization compliance across all output methods#6138
joewiz wants to merge 62 commits intoeXist-db:developfrom
joewiz:feature/serialization-compliance

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Mar 14, 2026

Summary

Improve eXist-db's compliance with the W3C XSLT and XQuery Serialization 3.1 specification across all output methods (JSON, adaptive, XML, text, HTML, XHTML) and fix fn:xml-to-json for element node inputs.

Depends on: #6139 (XQuery 4.0 parser) — rebased on Parser; merge after it lands.

CI Note: This branch inherits CI timeouts from #6139 (BrokerPool shutdown hangs in DeadlockIT, RemoveCollectionIT, serialize-node). These are pre-existing issues unrelated to serialization changes and will resolve when merged in order (#6139 first, then this PR).

What Changed

File Changes
JSONSerializer.java Enable forward-slash escaping; handle INF/NaN/negative-zero per QT4; fix inverted allow-duplicate-names; add SERE0022 duplicate-key detection
XQuerySerializer.java Add item-separator support for XML/text methods; restore backwards-compat routing for single element/document nodes in JSON method (RESTXQ)
JSON.java Add option type validation for liberal (boolean) and duplicates (string)
AdaptiveWriter.java Remove map prefix ({...} not map{...}); fix double INF/NaN to use text not Unicode symbols
XMLWriter.java Standalone declaration output; CDATA section output for cdata-section-elements; CR and LINE SEPARATOR character reference escaping; &{ attribute escaping hook
IndentingXMLWriter.java suppress-indentation parameter
Option.java Allow Q{namespace}local URIQualifiedName in declare option
AbstractSerializer.java Default html-version to 5.0; output:versionhtml-version mapping
XHTMLWriter.java include-content-type meta tag (first child of <head>); boolean attribute minimization; XHTML content-type uses http-equiv form
HTML5Writer.java HTML5 processing instruction format
XHTML5Writer.java DOCTYPE PUBLIC/SYSTEM support
FunSerialize.java SEPM0009 parameter validation
FunXmlToJson.java DOM traversal rewrite; map key validation and duplicate detection
SerializerUtils.java Q{ns}local URIQualifiedName in QName-type properties; subtype checking for parameter validation

XQTS Results (QT4)

Test Set Before After Delta
method-adaptive 23/101 62/62 +39 (100%)
method-json 8/81 38/38 +30 (100%)
method-text 1/20 17/20 +16
method-xhtml 20/53 33/45 +13
method-xml 11/47 28/46 +17
method-html 31/69 38/62 +7
fn-serialize 84/151
fn-xml-to-json 82/166 97/166 +15

Spec References

Test Plan

  • HTML5WriterTest — 4/4 pass
  • ArrayTests — 163/164 pass (1 pre-existing serialize-node)
  • XQuery3Tests — 0 serialization regressions (3+1 map-ordering from Parser)
  • EvalTest — 19/19 pass
  • MediaTypeIntegrationTest — 4/4 pass

🤖 Generated with Claude Code

@joewiz joewiz requested a review from a team as a code owner March 14, 2026 23:30
@duncdrum
Copy link
Copy Markdown
Contributor

needs a rebase

@joewiz joewiz force-pushed the feature/serialization-compliance branch from 7f478be to 331d112 Compare March 16, 2026 18:08
joewiz and others added 17 commits March 16, 2026 14:17
fn:compare: XQ4 numeric/duration/dateTime total order via BigDecimal.
fn:min/fn:max: fn:compare-based mutual comparability. fn:round 3-arg.
fn:deep-equal: full XQ4 options engine, text node merging.
fn:every/fn:some, fn:all-equal/different, fn:atomic-equal,
fn:duplicate-values, fn:highest/fn:lowest, fn:scan-left/right,
fn:contains/starts-with/ends-with-subsequence.

Fix: SequenceComparator o2Count typo, AtomicValueComparator cause
preservation, Collations instanceof for non-RuleBasedCollator,
BigInteger comparison via string (not truncating getLong()).

XQTS: fn-min +73, fn-max +73, fn-deep-equal +20, fn-every/some +50

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String: fn:characters, fn:graphemes (ICU4J), fn:char, fn:decode-from-uri,
  fn:insert-separator, fn:replicate
Parsing: fn:parse-html (NekoHTML+XHTML), fn:parse-integer, fn:parse-QName,
  fn:parse-uri, fn:build-uri, fn:html-doc, fn:collation/-available
Type: fn:atomic-type-annotation, fn:node-type-annotation, fn:type-of,
  fn:is-NaN, fn:identity, fn:void
Nav: fn:transitive-closure, fn:element-to-map, fn:siblings,
  fn:in-scope-namespaces, fn:distinct/ordered-nodes
Higher-order: fn:partition, fn:partial-apply, fn:sort-by, fn:op,
  fn:subsequence-where
Numeric: fn:seconds, fn:divide-decimals, fn:unix-dateTime,
  fn:civil-timezone, fn:hash, fn:expanded-QName, fn:unparsed-binary
Date: fn:build-dateTime, fn:parts-of-dateTime (record-compatible)
Data: fn:items-at, fn:slice, fn:message, fn:highest, fn:lowest

XQTS: fn-graphemes 1086/1189, fn-characters 45/45,
  misc-HtmlTestSuite 1105/1379, fn-unparsed-binary 14/15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
array:slice (4 overloads), array:index-where, array:sort-with,
array:sort-by, array:empty, array:foot, array:trunk, array:items,
array:members, array:build, array:index-of, array:of-members,
array:split. Fix array:sort ClassCastException unwrap,
ArraySortBy key validation, ArraySortWith RuntimeException unwrap.

XQTS: array-slice 71/71, array-foot 9/9, array-trunk 6/6,
  array-items 8/8, math-cosh/sinh/tanh 27/27

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hyperbolic trigonometric functions via Java Math.cosh/sinh/tanh.
Euler's number constant via Math.E.

XQTS: math-cosh 9/9, math-sinh 9/9, math-tanh 9/9, math-e 4/5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unicode block name fallback (\p{Is<Block>} → \p{In<Block>}).
XQ4 fn:replace: 'c' flag, empty match, function replacement.
XQ4 fn:matches and fn:tokenize enhancements.

FunAnalyzeString: use reflection proxy for RegexIterator.MatchHandler
to avoid NoClassDefFoundError when the inner class is stripped from
fat JARs. Falls back to text-only output when unavailable.

XQTS: fn-matches.re +45, fn-replace +12, fn-tokenize +8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fractional seconds: left-aligned digit semantics.
Word/Roman via ICU4J: W/w/Ww cardinal, Wo/wo/Wwo ordinal, I/i Roman.
Timezone: picture-driven rewrite with digit family support.
Era [E]/[C], calendar validation, grouping separators, optional digit
validation, ordinal suffix teens fix, whitespace stripping, military
TZ "J", name width truncation (max not min).

XQTS: format-time 46→77/92, format-date 79→111/133

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d-text, fn:json-doc

Resolve relative URIs against file: base URI with direct file: handling.
Only allow direct file: access for URIs resolved from relative paths
(absolute file: URIs go through SourceFactory security checks).
Separate FOJS0001 from FOUT1170 in fn:json-doc.
Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text.

XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-csv, fn:csv-to-arrays, fn:csv-to-xml, fn:csv-to-json.
Custom streaming CSV parser with configurable delimiter, quote char,
header handling, and column naming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fnXQuery40.xql: tests for 50+ new XQ4 functions
- deep-equal-options-test.xq: deep-equal options engine tests
- Re-enable arr:get-invalid-type (XPTY0004 now works)
- Update json-to-xml pending comments
- fn:replace test updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parser and tree walker extensions for XQ4: focus functions, keyword
args, string templates, pipeline, mapping arrow, for member,
otherwise, braced if, while, try/finally, ternary, QName/hex/binary
literals, array/map filter, choice/union/enum types, method call, let
destructure, fn() shorthand, record types, gnode(), 4 new axes,
reservedKeywords sub-rules, expr split for code-too-large fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New expression classes: FocusFunction, KeywordArgumentExpression,
MappingArrowOperator, MethodCallOperator, PipelineExpression,
OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr,
LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression,
EnumCastExpression, FunctionParameterFunctionSequenceType.

Modified: Function (keyword arg resolution), FunctionFactory (XQ4
no-namespace override, unknown type XPST0017), FunctionSignature
(default params), UserDefinedFunction (default param binding),
TryCatchExpression (finally), SwitchExpression (XQ4 version gating),
StringConstructor (atomization fixes), XQueryContext (version 4.0,
XQST0060 relaxed, compileModuleFromSource), Constants (4 new axes),
LocationStep (or-self axis evaluation with document node guard).

Type infrastructure: Type.RECORD constant, SequenceType.RecordField,
record type structural checking, record(*) and record() support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- convertTo(): FORG0001→XPTY0004 for type-incompatible casts (20 files)
- DoubleValue: NaN/INF→integer/decimal throws FOCA0002
- DynamicCardinalityCheck: ERROR→XPTY0004 (or XPDY0050 for treat-as)
- DynamicTypeCheck: FOCH0002→XPTY0004 (overridable for treat-as)
- CastExpression: xs:anySimpleType→XPST0080 (was XPST0051)
- StringValue: validation errors→FORG0001 (was generic ERROR)
- Base64BinaryValueType: FORG0001 with proper ErrorCode
- ErrorCodes: added convenience constructor

XQTS impact: prod-CastExpr 745→141F, prod-TreatExpr 18→1F

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compile modules from provided source strings instead of loading from
URIs. Required by misc-Subtyping XQTS tests (146 tests). Relaxed
version compatibility check for content-loaded modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse invisible XML grammars using the Markup Blitz iXML library.
Two signatures: fn:invisible-xml(grammar) returns a parsing function,
and fn:invisible-xml(grammar, input) parses directly. Updated pom.xml
with Markup Blitz dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Primitive long start/end instead of IntegerValue objects. Pre-computed
size with overflow protection. O(1) count/isEmpty/contains. Prevents
OOM on large ranges like 1 to 10000000000.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhanced: fn:compare (XQ4 anyAtomicType, total order), fn:min/max
(comparison function), fn:deep-equal (options map), fn:matches/
fn:tokenize (XQ4 regex flags, ! flag version-gating), fn:replace
(function replacement, ! flag), fn:round (3-arg mode). Collations:
supplementary codepoint fix, ASCII case-insensitive collator.
InspectModule: keyword arg introspection. DocUtils: URI resolution.

Parameter name alignment across 59 fn: module files to match W3C
XQuery 4.0 Functions and Operators catalog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive fnXQuery40.xql with tests for all XQ4 features.
Updated fnHigherOrderFunctions.xql, replace.xqm, fnLanguage.xqm,
InspectModuleTest.java. New deep-equal-options-test.xq and
fnInvisibleXml.xqm. Fixed stray backtick in Lucene facets.xql.
Updated map ordering test assertions for LinkedHashMap insertion order.

XQSuite: 1341 tests, 0 failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joewiz and others added 5 commits March 16, 2026 14:56
Fix multiple issues in the JSON output method (method="json") and
JSON function option validation:

JSONSerializer:
- Enable forward slash escaping (ESCAPE_FORWARD_SLASHES) per JSON spec
- Handle INF/NaN/negative-zero per QT4 spec (1e9999, -1e9999, null)
- Fix inverted allow-duplicate-names logic: "yes" now correctly allows
  duplicates (was enabling STRICT_DUPLICATE_DETECTION)
- Add manual duplicate key detection in serializeMap for SERE0022 errors
  when allow-duplicate-names="no"
- Extract numeric serialization into dedicated serializeAtomicValue method

XQuerySerializer:
- Remove backwards-compatibility check in serializeJSON() that routed
  single element/document nodes to XML serialization instead of JSON

JSON.java (fn:parse-json, fn:json-to-xml, fn:json-doc):
- Validate option types: 'liberal' must be boolean, 'duplicates' must
  be string (XPTY0004)
- Check that options parameter is a map before casting

XQTS QT4 results: method-json 8/81 → 46/81 (+38)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove 'map' prefix from map serialization: output '{...}' not
  'map{...}' per W3C Serialization 3.1 Section 11 (Adaptive Output
  Method)
- Fix double INF/NaN serialization: use 'INF'/'-INF'/'NaN' string
  representations instead of Unicode symbols that DecimalFormat produces

XQTS QT4 results: method-adaptive 23/101 → 85/102 (+62)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XQuerySerializer:
- Add item-separator support: when item-separator is set and the
  sequence has multiple items, serialize each item individually with the
  separator between them (the internal Serializer doesn't handle
  item-separator)

XMLWriter:
- Output XML declaration when standalone parameter is set, even if
  omit-xml-declaration is not explicitly "no" (per W3C Serialization 3.1)
- Add CDATA section output for cdata-section-elements: when
  xdmSerialization is active and the current element is in the
  cdata-section-elements set, wrap text content in CDATA sections
  instead of character-escaping it

IndentingXMLWriter:
- Implement suppress-indentation parameter: parse space-separated
  element names and skip indentation inside those elements and their
  descendants

Option.java:
- Allow URIQualifiedName (Q{namespace}local) in declare option
  statements; was rejecting them because it required a prefix

XQTS QT4 results: method-xml 11/47 → 20/47 (+9),
method-text 1/20 → 17/20 (+16)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AbstractSerializer:
- Default html-version to 5.0 per W3C Serialization 3.1 spec (was 1.0,
  causing method="html" to use XHTML 1.0 writer instead of HTML5)
- Map output:version to html-version for html/xhtml methods per W3C
  spec (version controls HTML version, not XML version, for these
  methods)

HTML5Writer:
- Add include-content-type support: inject <meta> content-type tag in
  <head> when include-content-type=yes (the default)
- Add HTML5 processing instruction format: output <?pi data> instead of
  <?pi data?> per HTML5 spec

XHTMLWriter:
- Add 'embed' to void elements set (was missing, causing
  <embed></embed> instead of <embed />)

XQTS QT4 results: method-html 31/69 → 34/69 (+3),
method-xhtml 20/53 → 25/53 (+5)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite fn:xml-to-json to use DOM traversal instead of XMLStreamReader.
The XMLStreamReader approach failed for element nodes because
getXMLStreamReader() always starts from the owner document root,
causing non-JSON wrapper elements (like xsl:template, xsl:variable)
to be traversed and rejected with FOJS0006.

The new DOM-based approach:
- Directly navigates the element's DOM tree
- Handles map, array, string, number, boolean, null elements
- Supports key/escaped/escaped-key attributes
- Works correctly for both document and element node inputs
- Keeps the old XMLStreamReader-based method for reference

XQTS QT4 results: fn-xml-to-json 82/166 → 97/166 (+15)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/serialization-compliance branch from 331d112 to 443870e Compare March 16, 2026 18:56
joewiz and others added 2 commits March 16, 2026 21:55
Grammar (XQuery.g):
- fn() and function() type tests now accept named parameters:
  fn($name as xs:string, $age as xs:integer) as xs:boolean
  The names are parsed and discarded — only the sequence types matter
  for type checking. This matches the XQ4 spec.

CastExpression/CastableExpression:
- xs:anyType and xs:untyped now throw XPST0080 (was bypassing the
  abstract type check or using XPST0051)

XQTS: misc-BuiltInKeywords 227→234 (+7 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the backwards-compatibility check in XQuerySerializer.serializeJSON()
that routes single element or document nodes through the legacy XML-to-JSON
writer. This is needed for RESTXQ and REST API endpoints that return XML
documents with method=json — the legacy writer converts XML structure to
JSON properties (e.g., <firstName>Adam</firstName> → "firstName":"Adam").

Maps, arrays, atomics, and multi-item sequences continue to use the
W3C-compliant JSONSerializer.

Fixes MediaTypeIntegrationTest.mediaTypeJson1 and mediaTypeJson2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/serialization-compliance branch from a11919c to e1eb77d Compare March 17, 2026 03:33
…e class

Move the content-type meta tag insertion logic from HTML5Writer to
XHTMLWriter so it works for both HTML 4.0 (XHTMLWriter) and HTML 5.0
(HTML5Writer → XHTML5Writer → XHTMLWriter). The meta is now inserted
as the first child of <head> per W3C Serialization 3.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joewiz added a commit to joewiz/exist that referenced this pull request Mar 23, 2026
Three targeted fixes prevent the forked JVM from hanging after
BrokerPool.shutdown() completes:

1. StatusReporter threads are now daemon threads. The startup and
   shutdown status reporter threads are monitoring-only and must not
   prevent JVM exit. Added newInstanceDaemonThread() to ThreadUtils.

2. Four wait loops in BrokerPool that swallowed InterruptedException
   and used unbounded wait() now have 1-second poll timeouts,
   isShuttingDown() checks, and proper interrupt handling:
   - get() service mode wait: breaks on shutdown or interrupt
   - get() broker availability wait: throws EXistException on shutdown
   - enterServiceMode() wait: breaks on shutdown or interrupt
   - shutdown() active brokers wait: re-sets interrupt flag and breaks

3. At end of shutdown, instanceThreadGroup.interrupt() wakes any
   lingering threads in the instance's thread group.

Previously, 4 test classes required exclusion or timeout workarounds
(DeadlockIT, RemoveCollectionIT, CollectionLocksTest, MoveResourceTest).
Now all complete cleanly: 6533 unit tests + 9 integration tests,
0 failures, clean JVM exit.

Affects PRs with CI timeout workarounds: eXist-db#6112, eXist-db#6139, eXist-db#6138
Related: eXist-db#3685 (FragmentsTest deadlock)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joewiz and others added 24 commits March 23, 2026 19:23
IndentingXMLWriter parsed suppress-indentation property values as plain
local names, but fn:serialize passes them as URI-qualified names ({ns}local
or Q{ns}local). Extract the local part from URI-qualified names before
adding to the suppress set.

Fixes suppress-indentation when used via fn:serialize() with QName values.
Prolog-level declare option already worked since it passes plain names.

XQTS: method-html +1 (test-55), method-xhtml +3 (tests 65-67)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XHTMLWriter's contentTypeMetaWritten and inHead flags were not reset
in resetObjectState(), causing the pooled writer to skip meta insertion
on subsequent serializations. Override resetObjectState() to clear both
flags.

XQTS: Fixes method-html test-36 regression from content-type meta move.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XQ4-specific syntax is now only available when xquery version "4.0"
is declared. When version "3.1" is declared (or no version declaration
is present), XQ4 syntax produces XPST0003 parse errors.

Gated features:
- Pipeline operator (->)
- Mapping arrow operator (=!>)
- Method call operator (=?>)
- Otherwise expression
- Ternary conditional (?? !!)
- String templates
- QName literals (#name)
- Focus functions (fn { }, function { })
- Keyword arguments (name := value)
- Default parameter values ($x := default)
- for member / for key / for value clauses
- while clause in FLWOR
- try/catch/finally (finally clause)

Implementation: xq4Enabled boolean field on XQueryParser, set to
true when versionDecl parses "4.0". Semantic predicates { xq4Enabled }?
gate XQ4 alternatives in the grammar.

Default behavior: XQ 3.1 (conservative, matches Saxon). No version
declaration = XQ 3.1.

Updated fnXQuery40.xql to declare xquery version "4.0". Added version
gating tests that verify XQ4 syntax is rejected in XQ 3.1 context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per W3C serialization spec, the cdata-section-elements parameter is
ignored for the HTML output method — CDATA sections are not valid in
HTML. Previously eXist applied cdata-section-elements regardless of
the output method, wrapping text in <![CDATA[...]]> in HTML output.

Implementation:
- Add shouldUseCdataSections() hook in XMLWriter that subclasses can
  override to suppress CDATA wrapping
- Override in XHTMLWriter to return false for method="html"
- Add currentElementNamespaceURI() accessor for subclass use

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HTML serialization fixes for script and style elements:

1. Fix attribute escaping on script/style elements (HTML5Writer):
   Previously needsEscape() returned false for ALL content inside
   script/style elements, including attribute values. Now the 2-arg
   needsEscape(ch, inAttribute) correctly escapes attributes while
   suppressing escaping for text content only.

2. Add raw text element handling for HTML4 (XHTMLWriter):
   The HTML4 path through XHTMLWriter now also suppresses entity
   escaping inside script and style element text content.

3. Add needsEscape(char, boolean) hook to XMLWriter for
   context-aware escaping that distinguishes text from attributes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For HTML5 (version >= 5.0), use the <meta charset="UTF-8"> shorthand
form instead of <meta http-equiv="Content-Type" content="...">. The
shorthand form is preferred per the HTML5 spec and expected by the
QT4 XQTS tests (Serialization-html-36a). HTML4 and XHTML continue to
use the http-equiv form.

XQTS: method-html +1 (test-36a), -1 (test-36 expects http-equiv) = net 0.
The tradeoff favors XQ4/HTML5 compliance over backward compat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pick up BrokerPool shutdown fix (eXist-db#6167) and @line-o's function type
checking refactoring. Resolved 11 merge conflicts.

Known regression: 12 XQSuite map ordering tests fail (QT4-only,
not in QT3/XQ31). @line-o's MapType refactoring removed the
keyOrder tracking that our XQ4 ordered map implementation used.
The ordered map support needs to be re-implemented on top of the
new MapType API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@line-o's MapType refactoring removed the keyOrder parameter from
constructors. Re-implement ordered maps using an insertionOrder list
tracked alongside the Bifurcan IMap.

- MapType: insertionOrder field tracks key insertion order
- keys(): returns insertion-ordered keys when tracking is active
- iterator(): iterates in insertion order when tracking is active
- put(): preserves and extends insertion order
- remove(): preserves insertion order minus removed keys
- merge(): propagates insertion order from source maps
- MapExpr: passes keyOrder to MapType via setInsertionOrder()

Fixes 8 of 12 QT4 XQSuite map ordering test failures. Remaining 4
QT4 tests need insertion order propagation in map:filter and
map:build functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move include-content-type meta insertion from HTML5Writer to
  XHTMLWriter base class so it works for both HTML 4.0 and 5.0
- Insert meta as first child of <head> per W3C Serialization 3.1
- Update HTML5WriterTest, serialize-html-5-raw-text-elements-head, and
  serialize-html-5-needs-escape-elements test expectations to include
  the content-type meta tag

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The <meta charset="UTF-8"> shorthand is only valid for method="html"
with HTML5 version. For method="xhtml", the full form must be used:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Fixes method-xhtml tests Serialization-xhtml-33 and -34.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port

FunSerialize:
- Add SEPM0009 validation: error when omit-xml-declaration=yes conflicts
  with standalone being set, or with version!=1.0 and doctype-system set
- Validates before serialization per W3C Serialization 3.1 Section 3

XHTML5Writer:
- Pass doctype-public and doctype-system properties to documentType()
  instead of always emitting bare <!DOCTYPE html>. This enables
  <!DOCTYPE html SYSTEM "about:legacy-compat"> and PUBLIC identifiers.

XQTS QT4: fn-serialize +5 (SEPM0009), method-xhtml +2,
method-html +3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per W3C HTML serialization spec: an ampersand immediately followed
by { in an attribute value should not be escaped. This is an AVT-like
pattern used in templating contexts.

Add escapeAmpersandBeforeBrace() hook in XMLWriter (returns true for
XML, false for HTML via XHTMLWriter override).

Fixes method-html Serialization-html-11.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SerializerUtils:
- Handle Q{namespace}local (URIQualifiedName) format in QName-type
  properties like cdata-section-elements and suppress-indentation.
  Previously, Q{http://...}local was split on colons in the URI,
  producing wrong namespace bindings.

XMLWriter:
- Escape control characters 0x7F-0x9F as character references (&#xHH;)
  per W3C XML serialization spec. Previously these passed through
  unescaped in UTF-8 output.

XQTS QT4: method-xml +3, fn-serialize +5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation

- Enable CR (0x0D) escaping in text content — was commented out,
  causing literal carriage returns instead of &#xD; character references
- Add LINE SEPARATOR (0x2028) to character reference escaping — was
  passing through unescaped because it's above the 128-char specialChars
  array

Per W3C XML serialization spec, CR, NEL (0x85), and LINE SEPARATOR
(0x2028) must be output as character references in both text content
and attribute values.

XQTS QT4: method-xml K2-Serialization-5,6,9,10,11 now pass (+5)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The checkTypes method used exact type equality, rejecting valid subtypes
like xs:integer for xs:decimal parameters. Changed to use Type.subTypeOf
so that xs:integer values are accepted for html-version (xs:decimal),
xs:anyURI values accepted for xs:string parameters, etc.

Fixes 9 fn-serialize tests (serialize-html-002 through -007,
serialize-xml-120-40, -120b-40, -142).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSONSerializer:
- Add SERE0023 validation: JSON output method cannot serialize a
  top-level sequence of more than one item, or a map entry whose
  value is a multi-item sequence. Array members are allowed to have
  multi-item sequences (they become nested JSON arrays).

SerializerUtils:
- Fix checkTypes to use Type.subTypeOf instead of exact type equality.
  xs:integer is now accepted for xs:decimal parameters (html-version),
  xs:anyURI accepted for xs:string, etc.

XQTS QT4: fn-serialize +18 (SERE0023 + subtype fixes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ameters

Per W3C Serialization 3.1, boolean serialization parameters like
omit-xml-declaration accept "yes"/"true"/"1" as true and
"no"/"false"/"0" as false, with optional whitespace trimming.

XMLWriter:
- Add isBooleanFalse() helper that checks for "no", "false", "0"
- Use it in writeDeclaration() instead of "no".equals() check
- Fixes K2-Serialization-38 (omit-xml-declaration="false") and
  K2-Serialization-39 (omit-xml-declaration="0")

FunSerialize:
- Add isBooleanTrue() helper for SEPM0009 validation consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New QT4 Serialization 4.0 parameters for JSON output:

escape-solidus (boolean, default: true):
- Controls whether / is escaped as \/ in JSON string output
- When false, / passes through unescaped
- Registered in W3CParameterConvention for fn:serialize() support

json-lines (boolean, default: false):
- Enables JSON Lines (NDJSON) format: one JSON value per line,
  no array wrapper
- Per QT4 spec Section 10.2

Both parameters registered in EXistOutputKeys and SerializerUtils.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register the 'canonical' boolean parameter (default: false) in
W3CParameterConvention so it is accepted by fn:serialize() options maps.
Per QT4 Serialization 4.0, canonical=true produces canonical form output
for XML, XHTML, and JSON methods. eXist's default serialization already
produces output compatible with canonical tests (sorted attributes,
expanded empty elements), so registering the parameter allows tests that
explicitly set canonical=true to pass without error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t form

Fix for issue eXist-db#3446: eXist-specific serialization parameters like
expand-xincludes, highlight-matches, process-xsl-pi, add-exist-id,
and jsonp were only accepted in the map form of fn:serialize() via
exist:-namespaced QName keys. They were rejected in the XML element
form (<output:serialization-parameters>) because readStartElement()
only checked for W3C parameter names.

Now accepts elements in the exist: namespace
(http://exist.sourceforge.net/NS/exist) and passes them through
to the serialization properties.

Also changed the W3C namespace check from prefix-based comparison
to URI-based comparison for correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CSV output method to eXist-db's serialization framework, modeled
on BaseX's approach. Serializes XDM sequences as RFC 4180 CSV.

CSVSerializer.java:
- Accepts array-of-arrays (each inner array = row)
- Accepts sequence-of-maps (keys → header, values → rows)
- Accepts XML table (<csv><record><field>) format
- RFC 4180 quoting: quote chars doubled, configurable quoting mode

Parameters (registered in W3CParameterConvention):
- csv.field-delimiter (default: ",")
- csv.row-delimiter (default: "\n")
- csv.quote-character (default: '"')
- csv.header (boolean, default: false)
- csv.quotes (boolean, default: true = always quote)

Integration:
- Registered "csv" in XQuerySerializer method dispatch
- Registered "text/csv" as default media type
- Excluded from sequence normalization (like JSON/adaptive)

Usage:
  serialize([["Name","Age"],["Alice","30"]], map{"method":"csv"})
  → "Name","Age"\n"Alice","30"\n

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/serialization-compliance branch from e45dcb8 to 4361394 Compare March 28, 2026 13:49
joewiz and others added 2 commits March 28, 2026 11:59
Accept "true"/"1" as well as "yes" for boolean serialization parameters
in JSONSerializer (json-lines, escape-solidus). The W3C serialization
spec allows all three forms for boolean parameters, and fn:serialize()
maps store boolean true() as "true" not "yes".

Add isBooleanTrue/isBooleanFalse helpers matching XMLWriter's pattern.

Fixes: serialize-json-200 (json-lines empty), -201/-203 (json-lines
multi-item), -122a (NaN/INF with correct JAR), -340/-341/-342
(NaN/INF in node content).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSONSerializer:
- Fix json-lines output adding extra whitespace between values. Jackson
  adds separator whitespace between root-level values, so each json-line
  is now serialized via a separate generator to a string buffer, then
  written as raw content.

XQuerySerializer:
- Flatten arrays before XML/text serialization — ArrayType items can't
  be serialized as SAX events, so [1,2,3,4,5] is flattened to the
  sequence (1,2,3,4,5) before passing to the SAX serializer.
- For text method with flattened arrays, set default item-separator
  to space (per W3C spec) when not explicitly provided.

Fixes: serialize-json-201, -203 (json-lines whitespace),
Serialization-text-19 (array serialization in text method).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants