Skip to content

[feature] Support file: URIs in fn:collection() for filesystem directory querying#6192

Open
joewiz wants to merge 1 commit intoeXist-db:developfrom
joewiz:feature/collection-file-uris
Open

[feature] Support file: URIs in fn:collection() for filesystem directory querying#6192
joewiz wants to merge 1 commit intoeXist-db:developfrom
joewiz:feature/collection-file-uris

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Mar 28, 2026

Summary

  • Extend fn:collection() to support file: URIs, enabling queries over filesystem directories
  • fn:collection("file:///path/to/dir") scans for *.xml files by default
  • fn:collection("file:///path/to/dir?select=*.xhtml") supports glob filtering (Saxon convention)
  • DBA-only access for file: URIs (security boundary, consistent with fn:doc())
  • Non-parseable files silently skipped; non-existent directories throw FODC0002

Motivation

BaseX and Saxon both support collection("file:/path/to/dir") for querying filesystem directories. eXist's fn:doc() already supports file: URIs but fn:collection() was limited to database collections only. The W3C spec says fn:collection() is implementation-defined, making this a conformant extension.

Test plan

  • XQSuite test: non-existent directory throws FODC0002
  • Manual: collection("file:///tmp/test-xml/") returns XML documents
  • Manual: ?select=*.xhtml glob filtering works
  • Manual: non-DBA user gets permission denied

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@joewiz joewiz requested a review from a team as a code owner March 28, 2026 04:35
@joewiz joewiz added the enhancement new features, suggestions, etc. label Mar 28, 2026
if (dynamicCollection != null) {
items.addAll(dynamicCollection);

} else if (collectionUri.getScheme() != null && collectionUri.getScheme().equals("file")) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (collectionUri.getScheme() != null && collectionUri.getScheme().equals("file")) {
} else if ("file".equals(collectionUri.getScheme()) {

This would be even a simpler check...

fn:collection() now supports file: URIs to scan a directory for XML
files and return them as in-memory documents. This matches the behavior
of BaseX and Saxon.

Usage:
  collection("file:///path/to/dir")           (: all *.xml files :)
  collection("file:///path/to/dir?select=*.xhtml")  (: glob filter :)

Security: only DBA users can access file: URIs (same restriction as
fn:doc for file system access). Non-parseable files are silently
skipped. Non-existent directories throw FODC0002.

Implementation: in getCollectionItems(), checks if the URI scheme is
"file", scans the directory with DirectoryStream + glob pattern, and
parses each matching file via DocUtils.parse() into a memtree document.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/collection-file-uris branch from a0dec57 to f5b2336 Compare March 28, 2026 16:17
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Mar 28, 2026

[This response was co-authored with Claude Code. -Joe]

Good catch, @reinhapa — simplified to "file".equals(collectionUri.getScheme()) which handles the null case cleanly. Pushed.

@adamretter
Copy link
Copy Markdown
Contributor

adamretter commented Mar 28, 2026

A few thoughts:

  1. This is really just a shortcut for doing the exact equivalent using the File extension module.

  2. fn:collection-uri already supports Saxon query string syntax that I added for match, content-type, and stable - it would be nice to see this unified with fn:collection i.e. both should support those options plus also select if that's one you would like to add.

  3. There are serious security implications around using file://. Can I suggest that you add some sensible controls around that in the same manner (or better) that we did for the File extension module please? Any time you provide a route to access the filesystem, you open up a security hole. I already demonstrated that several of the online XQuery and XSLT fiddles can be used to read /etc/passwd (and worse) from the hosted systems. eXide which is included by default is not different to a fiddle - not to mention the REST API etc that allows remote query execution by guest and/or non-DBA users..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement new features, suggestions, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants