Implement XQuery 4.0 parser, functions, and runtime support#6139
Implement XQuery 4.0 parser, functions, and runtime support#6139joewiz wants to merge 26 commits intoeXist-db:developfrom
Conversation
99bab01 to
59cca34
Compare
|
[This comment was co-authored with Claude Code. -Joe] XQuery 4.0 Functions Status (updated 2026-03-16)Implemented (19 of 27)
Remaining unimplemented (8 of 27)
Summary: 19 implemented (177 XQTS tests, many at 100%). 8 remaining: 1 partially unblocked, 2 schema-blocked, 4 JNode-blocked. |
fn:compare: XQ4 numeric/duration/dateTime total order via BigDecimal. fn:min/fn:max: fn:compare-based mutual comparability. fn:round 3-arg. fn:deep-equal: full XQ4 options engine, text node merging. fn:every/fn:some, fn:all-equal/different, fn:atomic-equal, fn:duplicate-values, fn:highest/fn:lowest, fn:scan-left/right, fn:contains/starts-with/ends-with-subsequence. Fix: SequenceComparator o2Count typo, AtomicValueComparator cause preservation, Collations instanceof for non-RuleBasedCollator, BigInteger comparison via string (not truncating getLong()). XQTS: fn-min +73, fn-max +73, fn-deep-equal +20, fn-every/some +50 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String: fn:characters, fn:graphemes (ICU4J), fn:char, fn:decode-from-uri, fn:insert-separator, fn:replicate Parsing: fn:parse-html (NekoHTML+XHTML), fn:parse-integer, fn:parse-QName, fn:parse-uri, fn:build-uri, fn:html-doc, fn:collation/-available Type: fn:atomic-type-annotation, fn:node-type-annotation, fn:type-of, fn:is-NaN, fn:identity, fn:void Nav: fn:transitive-closure, fn:element-to-map, fn:siblings, fn:in-scope-namespaces, fn:distinct/ordered-nodes Higher-order: fn:partition, fn:partial-apply, fn:sort-by, fn:op, fn:subsequence-where Numeric: fn:seconds, fn:divide-decimals, fn:unix-dateTime, fn:civil-timezone, fn:hash, fn:expanded-QName, fn:unparsed-binary Date: fn:build-dateTime, fn:parts-of-dateTime (record-compatible) Data: fn:items-at, fn:slice, fn:message, fn:highest, fn:lowest XQTS: fn-graphemes 1086/1189, fn-characters 45/45, misc-HtmlTestSuite 1105/1379, fn-unparsed-binary 14/15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
array:slice (4 overloads), array:index-where, array:sort-with, array:sort-by, array:empty, array:foot, array:trunk, array:items, array:members, array:build, array:index-of, array:of-members, array:split. Fix array:sort ClassCastException unwrap, ArraySortBy key validation, ArraySortWith RuntimeException unwrap. XQTS: array-slice 71/71, array-foot 9/9, array-trunk 6/6, array-items 8/8, math-cosh/sinh/tanh 27/27 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hyperbolic trigonometric functions via Java Math.cosh/sinh/tanh. Euler's number constant via Math.E. XQTS: math-cosh 9/9, math-sinh 9/9, math-tanh 9/9, math-e 4/5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unicode block name fallback (\p{Is<Block>} → \p{In<Block>}).
XQ4 fn:replace: 'c' flag, empty match, function replacement.
XQ4 fn:matches and fn:tokenize enhancements.
FunAnalyzeString: use reflection proxy for RegexIterator.MatchHandler
to avoid NoClassDefFoundError when the inner class is stripped from
fat JARs. Falls back to text-only output when unavailable.
XQTS: fn-matches.re +45, fn-replace +12, fn-tokenize +8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fractional seconds: left-aligned digit semantics. Word/Roman via ICU4J: W/w/Ww cardinal, Wo/wo/Wwo ordinal, I/i Roman. Timezone: picture-driven rewrite with digit family support. Era [E]/[C], calendar validation, grouping separators, optional digit validation, ordinal suffix teens fix, whitespace stripping, military TZ "J", name width truncation (max not min). XQTS: format-time 46→77/92, format-date 79→111/133 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d-text, fn:json-doc Resolve relative URIs against file: base URI with direct file: handling. Only allow direct file: access for URIs resolved from relative paths (absolute file: URIs go through SourceFactory security checks). Separate FOJS0001 from FOUT1170 in fn:json-doc. Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text. XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-csv, fn:csv-to-arrays, fn:csv-to-xml, fn:csv-to-json. Custom streaming CSV parser with configurable delimiter, quote char, header handling, and column naming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fnXQuery40.xql: tests for 50+ new XQ4 functions - deep-equal-options-test.xq: deep-equal options engine tests - Re-enable arr:get-invalid-type (XPTY0004 now works) - Update json-to-xml pending comments - fn:replace test updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parser and tree walker extensions for XQ4: focus functions, keyword args, string templates, pipeline, mapping arrow, for member, otherwise, braced if, while, try/finally, ternary, QName/hex/binary literals, array/map filter, choice/union/enum types, method call, let destructure, fn() shorthand, record types, gnode(), 4 new axes, reservedKeywords sub-rules, expr split for code-too-large fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New expression classes: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType. Modified: Function (keyword arg resolution), FunctionFactory (XQ4 no-namespace override, unknown type XPST0017), FunctionSignature (default params), UserDefinedFunction (default param binding), TryCatchExpression (finally), SwitchExpression (XQ4 version gating), StringConstructor (atomization fixes), XQueryContext (version 4.0, XQST0060 relaxed, compileModuleFromSource), Constants (4 new axes), LocationStep (or-self axis evaluation with document node guard). Type infrastructure: Type.RECORD constant, SequenceType.RecordField, record type structural checking, record(*) and record() support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- convertTo(): FORG0001→XPTY0004 for type-incompatible casts (20 files) - DoubleValue: NaN/INF→integer/decimal throws FOCA0002 - DynamicCardinalityCheck: ERROR→XPTY0004 (or XPDY0050 for treat-as) - DynamicTypeCheck: FOCH0002→XPTY0004 (overridable for treat-as) - CastExpression: xs:anySimpleType→XPST0080 (was XPST0051) - StringValue: validation errors→FORG0001 (was generic ERROR) - Base64BinaryValueType: FORG0001 with proper ErrorCode - ErrorCodes: added convenience constructor XQTS impact: prod-CastExpr 745→141F, prod-TreatExpr 18→1F Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compile modules from provided source strings instead of loading from URIs. Required by misc-Subtyping XQTS tests (146 tests). Relaxed version compatibility check for content-loaded modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse invisible XML grammars using the Markup Blitz iXML library. Two signatures: fn:invisible-xml(grammar) returns a parsing function, and fn:invisible-xml(grammar, input) parses directly. Updated pom.xml with Markup Blitz dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Primitive long start/end instead of IntegerValue objects. Pre-computed size with overflow protection. O(1) count/isEmpty/contains. Prevents OOM on large ranges like 1 to 10000000000. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhanced: fn:compare (XQ4 anyAtomicType, total order), fn:min/max (comparison function), fn:deep-equal (options map), fn:matches/ fn:tokenize (XQ4 regex flags, ! flag version-gating), fn:replace (function replacement, ! flag), fn:round (3-arg mode). Collations: supplementary codepoint fix, ASCII case-insensitive collator. InspectModule: keyword arg introspection. DocUtils: URI resolution. Parameter name alignment across 59 fn: module files to match W3C XQuery 4.0 Functions and Operators catalog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive fnXQuery40.xql with tests for all XQ4 features. Updated fnHigherOrderFunctions.xql, replace.xqm, fnLanguage.xqm, InspectModuleTest.java. New deep-equal-options-test.xq and fnInvisibleXml.xqm. Fixed stray backtick in Lucene facets.xql. Updated map ordering test assertions for LinkedHashMap insertion order. XQSuite: 1341 tests, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4a80095 to
0549468
Compare
|
[This comment was co-authored with Claude Code. -Joe] CI Status NotesSendEmailIT failure (macOS/ubuntu/windows integration): W3C XQTS CI failure: Unit tests: All pass (ubuntu). |
Grammar (XQuery.g): - fn() and function() type tests now accept named parameters: fn($name as xs:string, $age as xs:integer) as xs:boolean The names are parsed and discarded — only the sequence types matter for type checking. This matches the XQ4 spec. CastExpression/CastableExpression: - xs:anyType and xs:untyped now throw XPST0080 (was bypassing the abstract type check or using XPST0051) XQTS: misc-BuiltInKeywords 227→234 (+7 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The XQuery 4.0 spec requires mandatory whitespace after (# in pragma expressions: (# S EQName. This disambiguates from ( + #EQName (QName literal syntax). Previously, (# was always matched as PRAGMA_START regardless of what followed, causing function-lookup(#math:e, 0) to fail with XPST0003. Fix: PRAGMA_START now requires whitespace after (#, and the main lexer dispatch checks LA(3) for whitespace before attempting pragma matching. When (# is followed directly by a name character, the lexer matches ( as LPAREN and # as HASH separately. Added XQSuite tests for QName literals in function call arguments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous fix required mandatory whitespace after (# (XQ4 spec), but this broke XQuery 3.1 pragma expressions like (#exist:optimize#) which have no whitespace after (#. New approach: isPragmaContext() scans past (# and the QName to check what follows. If followed by , or ) it's a QName literal argument (e.g., function-lookup(#math:e, 0)). Otherwise it's a pragma expression. This handles both XQ3.1 and XQ4 correctly. Fixes ValueIndexByQNameTest and ValueIndexTest failures caused by (#exist:optimize#) pragma expressions being rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set 600s (10 min) timeout for forked test processes in exist-core. This prevents premature fork kills on slow CI runners during BrokerPool shutdown, which was causing the unit test job to time out at the 45-minute GitHub Actions limit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three targeted fixes prevent the forked JVM from hanging after BrokerPool.shutdown() completes: 1. StatusReporter threads are now daemon threads. The startup and shutdown status reporter threads are monitoring-only and must not prevent JVM exit. Added newInstanceDaemonThread() to ThreadUtils. 2. Four wait loops in BrokerPool that swallowed InterruptedException and used unbounded wait() now have 1-second poll timeouts, isShuttingDown() checks, and proper interrupt handling: - get() service mode wait: breaks on shutdown or interrupt - get() broker availability wait: throws EXistException on shutdown - enterServiceMode() wait: breaks on shutdown or interrupt - shutdown() active brokers wait: re-sets interrupt flag and breaks 3. At end of shutdown, instanceThreadGroup.interrupt() wakes any lingering threads in the instance's thread group. Previously, 4 test classes required exclusion or timeout workarounds (DeadlockIT, RemoveCollectionIT, CollectionLocksTest, MoveResourceTest). Now all complete cleanly: 6533 unit tests + 9 integration tests, 0 failures, clean JVM exit. Affects PRs with CI timeout workarounds: eXist-db#6112, eXist-db#6139, eXist-db#6138 Related: eXist-db#3685 (FragmentsTest deadlock) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
As discussed in todays community meeting we would like to keep this under a experimental-features flag until the XQuery 4.0 specification reaches stable state. |
|
Especially this should not be possible to evaluate xquery version "3.1";
() ?? <a/> !! <b/> |
line-o
left a comment
There was a problem hiding this comment.
Modules with xquery version "3.1"; should not allow to use 4.0 features
A separate question is if 4.0 parser should be the new default.
|
[This response was co-authored with Claude Code. -Joe] Agreed — version-gating XQuery 4.0 syntax is the right approach. Two points: 1. This is consistent with how BaseX handles it — XQ4 features are gated by the version declaration. 2. Implementation approach. The ANTLR 2 parser can check the version declaration (already extracted by We also have a hand-written recursive descent parser prototype (behind Will implement the version gate and update the PR. |
XQ4-specific syntax is now only available when xquery version "4.0"
is declared. When version "3.1" is declared (or no version declaration
is present), XQ4 syntax produces XPST0003 parse errors.
Gated features:
- Pipeline operator (->)
- Mapping arrow operator (=!>)
- Method call operator (=?>)
- Otherwise expression
- Ternary conditional (?? !!)
- String templates
- QName literals (#name)
- Focus functions (fn { }, function { })
- Keyword arguments (name := value)
- Default parameter values ($x := default)
- for member / for key / for value clauses
- while clause in FLWOR
- try/catch/finally (finally clause)
Implementation: xq4Enabled boolean field on XQueryParser, set to
true when versionDecl parses "4.0". Semantic predicates { xq4Enabled }?
gate XQ4 alternatives in the grammar.
Default behavior: XQ 3.1 (conservative, matches Saxon). No version
declaration = XQ 3.1.
Updated fnXQuery40.xql to declare xquery version "4.0". Added version
gating tests that verify XQ4 syntax is rejected in XQ 3.1 context.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pick up BrokerPool shutdown fix (eXist-db#6167) and @line-o's function type checking refactoring. Resolved 11 merge conflicts. Known regression: 12 XQSuite map ordering tests fail (QT4-only, not in QT3/XQ31). @line-o's MapType refactoring removed the keyOrder tracking that our XQ4 ordered map implementation used. The ordered map support needs to be re-implemented on top of the new MapType API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@line-o's MapType refactoring removed the keyOrder parameter from constructors. Re-implement ordered maps using an insertionOrder list tracked alongside the Bifurcan IMap. - MapType: insertionOrder field tracks key insertion order - keys(): returns insertion-ordered keys when tracking is active - iterator(): iterates in insertion order when tracking is active - put(): preserves and extends insertion order - remove(): preserves insertion order minus removed keys - merge(): propagates insertion order from source maps - MapExpr: passes keyOrder to MapType via setInsertionOrder() Fixes 8 of 12 QT4 XQSuite map ordering test failures. Remaining 4 QT4 tests need insertion order propagation in map:filter and map:build functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add FT_SCORE_VAR handling permanently to the Parser branch grammar, eliminating the need for manual patching on every next rebuild. XQuery.g: - FT_SCORE_VAR token - ftScoreVar rule: "score" "$" VarName in for bindings - ftScoreVarBinding rule: "let score $var := expr" syntax - Updated letClause to accept ftScoreVarBinding alternative - Updated inVarBinding to accept ftScoreVar after positionalVar - Added "score" to coreReservedKeywords and FLWOR lookahead XQueryTree.g: - ForLetClause: added scoreVar and isScoreBinding fields - For-clause: FT_SCORE_VAR handling after POSITIONAL_VAR - Let-clause: FT_SCORE_VAR handling before type declaration - FLWOR construction: setScoreVariable on ForExpr, setScoreBinding on LetExpr ForExpr.java, LetExpr.java: - Added stub setScoreVariable/setScoreBinding methods. The actual scoring implementation lives on the XQFT branch — these stubs ensure the parser accepts the syntax without breaking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Implements XQuery 4.0 parser and runtime support for eXist-db, covering the majority of the QT4CG specification draft syntax, 50+ new standard functions, enhanced existing functions, and W3C-compliant error codes. This brings eXist-db in line with the evolving XQuery 4.0 standard.
Based on the XQuery 4.0 Functions branch.
What Changed
1. Grammar — XQ4 syntax (XQuery.g + XQueryTree.g)
All major XQuery 4.0 syntax additions via ANTLR 2 grammar extensions:
fn { expr }name := expr`Hello {$name}`=>and mapping arrow=!>for member,whileclause,otherwise?? !!?[predicate]record(name as xs:string, age? as xs:integer, *)=?>, let destructuringfn(...)type shorthand,gnode()type test*-or-self,*-sibling-or-selfdeclare context value,xquery version "4.0"reservedKeywordssub-rules (merge-conflict reduction)exprrule split (code-too-large fix fornextbuilds)2. Expression classes (33 files)
New: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType.
Modified: Function, FunctionFactory, FunctionSignature, UserDefinedFunction, TryCatchExpression, SwitchExpression, StringConstructor, XQueryContext, Constants, LocationStep, SequenceType, Type.
3. Error code alignment (29 files)
convertTo()in 20 atomic typesDoubleValueNaN/INF castsDynamicCardinalityCheckDynamicTypeCheckTreatAsExpressionCastExpressionxs:anySimpleTypeFunctionFactoryunknown typesStringValuevalidationBase64BinaryValueType4. fn:load-xquery-module content option
XQ4
contentoption for dynamic module compilation from strings. Required by misc-Subtyping XQTS tests.5. fn:invisible-xml (Markup Blitz)
Parse invisible XML grammars using the Markup Blitz iXML library.
6. No-namespace function overriding (PR2200)
xquery version "4.0"allows declaring functions without namespace prefix, overriding fn: built-ins.7. RangeSequence optimization
Primitive long storage —
1 to 10000000000uses 24 bytes instead of OOM.8. Parameter name alignment (59 files)
W3C XQ4 catalog parameter names across fn: module for keyword argument support.
XQTS Results
QT4 XQTS Run 29 (2026-03-22): 36,638/41,630 (88.0%) — non-skip: 90.3%
Selected parser-related test sets:
XQSuite: 1341 tests, 0 failures (across all test suites: 1676 tests, 0 failures)
Recent improvements (since initial PR)
(#QNamedisambiguation — fixed parser collision between pragmas and inline function syntaxfn:collation1-arg overload — returns the default collation URIfn:parse-jsonoption validation — validatesliberal,duplicates,escape,fallbackoption types per specforkedProcessTimeoutInSeconds=600added to surefire config — prevents CI timeout on slow test classesSpec References
Limitations
Features not implemented: JNode data model, union node test syntax in axis steps, method calls (parsed but limited dispatch), version gating (XQ4 features available regardless of version declaration), XML Schema revalidation.
Test Plan
mvn teston CICo-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
CI Status
Unit tests (ubuntu): Green —
forkedProcessTimeoutInSeconds=600prevents individual test class timeouts.W3C XQTS: Times out at 1-hour CI limit. XQTS compliance validated locally via exist-xqts-runner (QT4 88.0%, XQ31 91.8%).
Integration tests (macOS/ubuntu/windows): All green.
Codacy: Reports pre-existing style issues — no new findings from this PR.
Version Gating (community call feedback, 2026-03-23)
XQuery 4.0 syntax is gated behind
xquery version "4.0"declaration. Queries withxquery version "3.1"or no version declaration behave exactly as ondevelop— no XQ4 syntax is available. This includes:->), mapping arrow (=!>), otherwisefor member,whileclause,try/finally?? !!)3 new tests verify XQ4 syntax is rejected in 3.1 mode.