Skip to content

Upgrade Saxon-HE from 9.9 to 12.5; eliminate exist-saxon-regex fork#6143

Open
joewiz wants to merge 1 commit intoeXist-db:developfrom
joewiz:feature/saxon-12-upgrade
Open

Upgrade Saxon-HE from 9.9 to 12.5; eliminate exist-saxon-regex fork#6143
joewiz wants to merge 1 commit intoeXist-db:developfrom
joewiz:feature/saxon-12-upgrade

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Mar 15, 2026

Summary

Upgrades Saxon-HE from 9.9.1-8 to 12.5 and eliminates the exist-saxon-regex fork (org.exist-db:exist-saxon-regex:9.4.0-9.e1), a copy of Saxon 9.4's internal regex classes that has been maintained separately for over a decade. Saxon 12's public regex API (JavaRegularExpression) makes the fork unnecessary — it is removed entirely.

Saxon-HE 12 uses MPL 2.0 (was MPL 1.0 in 9.x). MPL 2.0 is compatible with eXist-db's LGPL 2.1 via the "Larger Work" provision (MPL 2.0 §3.3).

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

What changed

Version bump and dependency removal

  • exist-parent/pom.xml: <saxon.version> 9.9.1-8 → 12.5
  • exist-core/pom.xml: Removed exist-saxon-regex dependency

Saxon 12 API migration

Saxon 12 made several breaking changes to internal APIs that eXist uses directly. Here is how each was addressed:

Saxon 12 change eXist files affected Migration approach
FastStringBuffer removed FloatValue, DoubleValue, DayTimeDurationValue Use Saxon 12's FloatValue.floatToString() and DoubleValue.doubleToString() which produce XPath-compliant formatting. (Note: FloatingPointConverter.convertDouble() does NOT — it omits scientific notation for large numbers.)
Regex APIs now take UnicodeString instead of String FunMatches, FunReplace, FunAnalyzeString Wrap with StringView.of(string)
XPathException.getErrorCodeLocalPart() removed FunMatches, FunReplace, FunAnalyzeString Use getErrorCodeQName().getLocalPart()
RegexIterator.MatchHandler moved to top-level RegexMatchHandler; characters() takes UnicodeString FunAnalyzeString Update interface reference and call .toString() on the UnicodeString
Xslt30Transformer.setInitialMode() now throws SaxonApiException Transform (fn:transform) Add try/catch
Duplicate document-URIs rejected in document pool Transform (fn:transform) Don't set system ID on source DOMSource — avoids collision with stylesheet's URI

exist-saxon-regex fork elimination

The fork (org.exist.thirdparty.net.sf.saxon.functions.regex) provided JDK15RegexTranslator for XPath-to-Java regex translation. Saxon 12's public JavaRegularExpression class provides the same functionality. Two files updated:

  • RegexUtil.java: JDK15RegexTranslator.translate()new JavaRegularExpression(StringView.of(pattern), flags).getJavaRegularExpression()
  • RewriteConfig.java: Same pattern

Saxon 12 behavioral changes in XSLT and DOM pipelines

Saxon 12 is stricter about several XML/namespace conventions that Saxon 9.9 tolerated. These required fixes deeper in eXist's infrastructure:

Implicit xml namespace declarations. eXist's persistent DOM stores namespace mappings for ALL prefixes encountered during document storage — including the implicit xml prefix (and in some cases, maps the empty prefix to the XML namespace URI). Saxon 12's NamespaceMap and SAX ReceivingContentHandler reject any explicit declaration involving http://www.w3.org/XML/1998/namespace. Three layers needed fixes:

  • ElementImpl.getAttributes(): Filter out namespace declaration attributes where the prefix is xml OR the URI is the XML namespace
  • EXistDbXMLReader.parse(): Wrap the SAX ContentHandler with XMLBackwardsCompatHandler to filter startPrefixMapping events for the XML namespace URI before they reach Saxon
  • StylesheetResolverAndCompiler.compileTemplates(): Same wrapping for the XSLT compilation pipeline

Duplicate startDocument SAX events. Saxon 12's LinkedTreeBuilder rejects receiving startDocument() more than once. eXist's StylesheetResolverAndCompiler.compileTemplates() called handler.startDocument() explicitly, then passed the handler to Serializer.toSAX() which could also send startDocument(). The new XMLBackwardsCompatHandler wrapper suppresses duplicate document events.

Null URIResolver. Saxon 12's SaxonTransformerFactory.setURIResolver() throws NPE on null. Serializer.java called setURIResolver(null) to "clear" the resolver after use — replaced with a no-op lambda.

xsl:import system ID resolution. StylesheetResolverAndCompiler now sets the full xmldb:exist:// URI as the TemplatesHandler system ID (not just the bare path), so Saxon 12 can properly resolve relative xsl:import/xsl:include hrefs.

Other fixes

  • XmlLibraryChecker: Updated minimum Saxon version from 8.9.0 to 12.0. The existing lexicographic comparison treated "12.5" < "8.9.0" (since '1' < '8'), causing a spurious "Failed to find a valid Transformer" warning.
  • AbstractGMLJDBCIndexWorker: Force JDK's built-in TransformerFactory for GeoTools' GeometryTransformer. Saxon 12's IdentityTransformer rejects SAXSources whose XMLReader does not support the lexical-handler property, which GeoTools' reader doesn't.
  • DocumentImplTest.checkNamespaces_saxon: Updated test expectation from 3 to 2 namespace attributes — Saxon 12's DocumentBuilderImpl no longer includes the implicit xml namespace in its DOM output.
  • fnTransform68.xqm: Changed transform-68-supports-dynamic-evaluation from %test:assertError("FOXT0001") to %test:assertTrue — Saxon 12 now reports support for dynamic evaluation.

Internal APIs that survived unchanged

These internal Saxon APIs are used by eXist and still exist with compatible signatures in Saxon 12.5 — no migration needed:

Alphanumeric (NumberFormatter), RetainedStaticContext and SystemProperty (fn:transform options), CharacterMap, CharacterMapIndex, SerializationProperties, IntHashMap (serialization), BuiltInAtomicType (type conversion), StructuredQName (fn:transform)

XQTS conformance

Develop (Saxon 9.9) Saxon 12 Delta
XQ 3.1 24,020 / 26,773 (89.7%) 23,233 / 26,773 (86.8%) -787
Improvements +32 across 7 test sets
Regressions -773 across 29 test sets

Improvements from Saxon 12 (+32 tests): fn-replace +6, prod-TryCatchExpr +6, prod-CastExpr.derived +5, fn-deep-equal +4, fn-base-uri +4, fn-analyze-string +4, fn-tokenize +3

Regressions — 88% are error code mismatches (FORG0001 where the spec expects XPTY0004), caused by Saxon 12's stricter type checking exposing pre-existing issues in eXist's type system. These are the same issues addressed by the XQuery 4.0 parser PR (#6139), which fixes FORG0001→XPTY0004 across 20 atomic types. When both PRs land, the regressions should largely resolve. Breakdown:

Root cause Count %
FORG0001 vs XPTY0004 error code 602 67.6%
Wrong error code (other) 179 20.1%
Wrong value 36 4.0%
XPTY0004 34 3.8%
Other 39 4.4%

Test results

  • Full exist-core test suite: 6,533 tests, 0 failures, 0 errors
  • XQSuite: 970 tests, 0 failures (identical to develop)
  • XQTS 3.1: +32 improvements, -773 regressions (error-code mismatches, not migration bugs)

Known CI issue: XQTS runner

The W3C XQuery Test Suite CI job fails because the exist-xqts-runner (a separate project at eXist-db/exist-xqts-runner) was compiled against Saxon 9.9 and calls AnyURIValue(CharSequence), which was removed in Saxon 12. The runner needs its own Saxon 12 compatibility update. The unit and integration test CI jobs are unaffected.

CI Status

All unit and integration tests green on all platforms. W3C XQTS CI job fails because the exist-xqts-runner was compiled against Saxon 9.9 and calls removed APIs — needs its own Saxon 12 compatibility update (tracked separately). Codacy green.

Merge order note: The -773 XQTS regressions (88% are FORG0001 vs XPTY0004 error code mismatches) are compensated by #6139 (XQuery 4.0 Parser), which fixes these error codes across 20 atomic types. When both PRs merge (Saxon 12 first, then Parser), the net XQTS impact should be positive.

Test plan

  • Full exist-core test suite: 6,533 tests, 0 failures, 0 errors
  • XQSuite: 970 tests, 0 failures
  • XQTS 3.1 conformance: +32 improvements, regressions are error-code mismatches
  • Full mvn test across all modules on CI
  • Verify ICU4J compatibility with Lucene 10 branch
  • Update exist-xqts-runner for Saxon 12 compatibility

🤖 Generated with Claude Code

@joewiz joewiz requested a review from a team as a code owner March 15, 2026 14:55
@joewiz joewiz force-pushed the feature/saxon-12-upgrade branch from 228d791 to 5cc6498 Compare March 16, 2026 03:46
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Mar 16, 2026

[This response was co-authored with Claude Code. -Joe]

The W3C XQuery Test Suite CI job fails because the exist-xqts-runner (a separate project, compiled against Saxon 9.9) calls AnyURIValue(CharSequence) which was removed in Saxon 12:

java.lang.NoSuchMethodError: 'void net.sf.saxon.value.AnyURIValue.<init>(java.lang.CharSequence)'
    at org.exist.xqts.runner.qt3.XQTS3CatalogParserActor.$anonfun$parseCatalog$1

The runner needs a separate Saxon 12 compatibility update. The unit and integration test jobs (the real CI gates) are unaffected.

…egex fork

Upgrade Saxon-HE from 9.9.1-8 to 12.5 and remove the exist-saxon-regex
fork (org.exist-db:exist-saxon-regex:9.4.0-9.e1), a copy of Saxon 9.4's
internal regex classes that has been maintained separately for over a decade.
Saxon 12's public regex API makes the fork unnecessary.

Saxon 12 API migration:
- FastStringBuffer removed: use FloatValue.floatToString() and
  DoubleValue.doubleToString() for XPath-compliant formatting
- Regex APIs now take UnicodeString: wrap with StringView.of()
- XPathException.getErrorCodeLocalPart() replaced by
  getErrorCodeQName().getLocalPart()
- RegexIterator.MatchHandler moved to top-level RegexMatchHandler
- Xslt30Transformer.setInitialMode() now throws SaxonApiException
- Saxon 12 rejects duplicate document-URIs in the document pool
- Saxon 12 rejects null URIResolver and explicit xml namespace
  declarations in DOM and SAX pipelines
- Saxon 12's LinkedTreeBuilder rejects duplicate startDocument events

exist-saxon-regex replaced by Saxon 12's JavaRegularExpression API.

Full exist-core test suite: 6533 tests, 0 failures, 0 errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/saxon-12-upgrade branch from 5cc6498 to 8cfb84c Compare March 16, 2026 04:25
@line-o line-o added this to v7.0.0 Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants