Upgrade Saxon-HE from 9.9 to 12.5; eliminate exist-saxon-regex fork#6143
Open
joewiz wants to merge 1 commit intoeXist-db:developfrom
Open
Upgrade Saxon-HE from 9.9 to 12.5; eliminate exist-saxon-regex fork#6143joewiz wants to merge 1 commit intoeXist-db:developfrom
joewiz wants to merge 1 commit intoeXist-db:developfrom
Conversation
228d791 to
5cc6498
Compare
Member
Author
|
[This response was co-authored with Claude Code. -Joe] The W3C XQuery Test Suite CI job fails because the The runner needs a separate Saxon 12 compatibility update. The unit and integration test jobs (the real CI gates) are unaffected. |
…egex fork Upgrade Saxon-HE from 9.9.1-8 to 12.5 and remove the exist-saxon-regex fork (org.exist-db:exist-saxon-regex:9.4.0-9.e1), a copy of Saxon 9.4's internal regex classes that has been maintained separately for over a decade. Saxon 12's public regex API makes the fork unnecessary. Saxon 12 API migration: - FastStringBuffer removed: use FloatValue.floatToString() and DoubleValue.doubleToString() for XPath-compliant formatting - Regex APIs now take UnicodeString: wrap with StringView.of() - XPathException.getErrorCodeLocalPart() replaced by getErrorCodeQName().getLocalPart() - RegexIterator.MatchHandler moved to top-level RegexMatchHandler - Xslt30Transformer.setInitialMode() now throws SaxonApiException - Saxon 12 rejects duplicate document-URIs in the document pool - Saxon 12 rejects null URIResolver and explicit xml namespace declarations in DOM and SAX pipelines - Saxon 12's LinkedTreeBuilder rejects duplicate startDocument events exist-saxon-regex replaced by Saxon 12's JavaRegularExpression API. Full exist-core test suite: 6533 tests, 0 failures, 0 errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5cc6498 to
8cfb84c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Upgrades Saxon-HE from 9.9.1-8 to 12.5 and eliminates the
exist-saxon-regexfork (org.exist-db:exist-saxon-regex:9.4.0-9.e1), a copy of Saxon 9.4's internal regex classes that has been maintained separately for over a decade. Saxon 12's public regex API (JavaRegularExpression) makes the fork unnecessary — it is removed entirely.Saxon-HE 12 uses MPL 2.0 (was MPL 1.0 in 9.x). MPL 2.0 is compatible with eXist-db's LGPL 2.1 via the "Larger Work" provision (MPL 2.0 §3.3).
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
What changed
Version bump and dependency removal
exist-parent/pom.xml:<saxon.version>9.9.1-8 → 12.5exist-core/pom.xml: Removedexist-saxon-regexdependencySaxon 12 API migration
Saxon 12 made several breaking changes to internal APIs that eXist uses directly. Here is how each was addressed:
FastStringBufferremovedFloatValue,DoubleValue,DayTimeDurationValueFloatValue.floatToString()andDoubleValue.doubleToString()which produce XPath-compliant formatting. (Note:FloatingPointConverter.convertDouble()does NOT — it omits scientific notation for large numbers.)UnicodeStringinstead ofStringFunMatches,FunReplace,FunAnalyzeStringStringView.of(string)XPathException.getErrorCodeLocalPart()removedFunMatches,FunReplace,FunAnalyzeStringgetErrorCodeQName().getLocalPart()RegexIterator.MatchHandlermoved to top-levelRegexMatchHandler;characters()takesUnicodeStringFunAnalyzeString.toString()on theUnicodeStringXslt30Transformer.setInitialMode()now throwsSaxonApiExceptionTransform(fn:transform)Transform(fn:transform)DOMSource— avoids collision with stylesheet's URIexist-saxon-regex fork elimination
The fork (
org.exist.thirdparty.net.sf.saxon.functions.regex) providedJDK15RegexTranslatorfor XPath-to-Java regex translation. Saxon 12's publicJavaRegularExpressionclass provides the same functionality. Two files updated:RegexUtil.java:JDK15RegexTranslator.translate()→new JavaRegularExpression(StringView.of(pattern), flags).getJavaRegularExpression()RewriteConfig.java: Same patternSaxon 12 behavioral changes in XSLT and DOM pipelines
Saxon 12 is stricter about several XML/namespace conventions that Saxon 9.9 tolerated. These required fixes deeper in eXist's infrastructure:
Implicit
xmlnamespace declarations. eXist's persistent DOM stores namespace mappings for ALL prefixes encountered during document storage — including the implicitxmlprefix (and in some cases, maps the empty prefix to the XML namespace URI). Saxon 12'sNamespaceMapand SAXReceivingContentHandlerreject any explicit declaration involvinghttp://www.w3.org/XML/1998/namespace. Three layers needed fixes:ElementImpl.getAttributes(): Filter out namespace declaration attributes where the prefix isxmlOR the URI is the XML namespaceEXistDbXMLReader.parse(): Wrap the SAXContentHandlerwithXMLBackwardsCompatHandlerto filterstartPrefixMappingevents for the XML namespace URI before they reach SaxonStylesheetResolverAndCompiler.compileTemplates(): Same wrapping for the XSLT compilation pipelineDuplicate
startDocumentSAX events. Saxon 12'sLinkedTreeBuilderrejects receivingstartDocument()more than once. eXist'sStylesheetResolverAndCompiler.compileTemplates()calledhandler.startDocument()explicitly, then passed the handler toSerializer.toSAX()which could also sendstartDocument(). The newXMLBackwardsCompatHandlerwrapper suppresses duplicate document events.Null
URIResolver. Saxon 12'sSaxonTransformerFactory.setURIResolver()throws NPE on null.Serializer.javacalledsetURIResolver(null)to "clear" the resolver after use — replaced with a no-op lambda.xsl:importsystem ID resolution.StylesheetResolverAndCompilernow sets the fullxmldb:exist://URI as the TemplatesHandler system ID (not just the bare path), so Saxon 12 can properly resolve relativexsl:import/xsl:includehrefs.Other fixes
XmlLibraryChecker: Updated minimum Saxon version from8.9.0to12.0. The existing lexicographic comparison treated"12.5" < "8.9.0"(since'1' < '8'), causing a spurious "Failed to find a valid Transformer" warning.AbstractGMLJDBCIndexWorker: Force JDK's built-inTransformerFactoryfor GeoTools'GeometryTransformer. Saxon 12'sIdentityTransformerrejects SAXSources whose XMLReader does not support thelexical-handlerproperty, which GeoTools' reader doesn't.DocumentImplTest.checkNamespaces_saxon: Updated test expectation from 3 to 2 namespace attributes — Saxon 12'sDocumentBuilderImplno longer includes the implicitxmlnamespace in its DOM output.fnTransform68.xqm: Changedtransform-68-supports-dynamic-evaluationfrom%test:assertError("FOXT0001")to%test:assertTrue— Saxon 12 now reports support for dynamic evaluation.Internal APIs that survived unchanged
These internal Saxon APIs are used by eXist and still exist with compatible signatures in Saxon 12.5 — no migration needed:
Alphanumeric(NumberFormatter),RetainedStaticContextandSystemProperty(fn:transform options),CharacterMap,CharacterMapIndex,SerializationProperties,IntHashMap(serialization),BuiltInAtomicType(type conversion),StructuredQName(fn:transform)XQTS conformance
Improvements from Saxon 12 (+32 tests): fn-replace +6, prod-TryCatchExpr +6, prod-CastExpr.derived +5, fn-deep-equal +4, fn-base-uri +4, fn-analyze-string +4, fn-tokenize +3
Regressions — 88% are error code mismatches (FORG0001 where the spec expects XPTY0004), caused by Saxon 12's stricter type checking exposing pre-existing issues in eXist's type system. These are the same issues addressed by the XQuery 4.0 parser PR (#6139), which fixes FORG0001→XPTY0004 across 20 atomic types. When both PRs land, the regressions should largely resolve. Breakdown:
Test results
Known CI issue: XQTS runner
The W3C XQuery Test Suite CI job fails because the
exist-xqts-runner(a separate project at eXist-db/exist-xqts-runner) was compiled against Saxon 9.9 and callsAnyURIValue(CharSequence), which was removed in Saxon 12. The runner needs its own Saxon 12 compatibility update. The unit and integration test CI jobs are unaffected.CI Status
All unit and integration tests green on all platforms. W3C XQTS CI job fails because the
exist-xqts-runnerwas compiled against Saxon 9.9 and calls removed APIs — needs its own Saxon 12 compatibility update (tracked separately). Codacy green.Merge order note: The -773 XQTS regressions (88% are FORG0001 vs XPTY0004 error code mismatches) are compensated by #6139 (XQuery 4.0 Parser), which fixes these error codes across 20 atomic types. When both PRs merge (Saxon 12 first, then Parser), the net XQTS impact should be positive.
Test plan
mvn testacross all modules on CIexist-xqts-runnerfor Saxon 12 compatibility🤖 Generated with Claude Code