Skip to content

Schema globbing#97

Open
pmetras wants to merge 15 commits intoadjust:masterfrom
pmetras:schema-globbing
Open

Schema globbing#97
pmetras wants to merge 15 commits intoadjust:masterfrom
pmetras:schema-globbing

Conversation

@pmetras
Copy link

@pmetras pmetras commented Sep 15, 2025

This adds the ability to map table names to paths in a IMPORT FOREIGN SCHEMA command, with the option tables_map.

Its value is a space-separated list of key=value values with table names as keys and Parquet files paths as values. Like for filename, file globbing can be used, and like with Linux PATH variable, the colon : is used to separate multiple paths within a value.

import foreign schema "/path/to/directory"
from server parquet_srv
into public options (tables_map 'table1=/path/to/directory/2022/*/*.parquet:/path/to/directory/2023/*/*.parquet table2=/path/to/directory{2024,2025}/*/*.parquet')
;

As a security, the extension checks that the paths in the tables_map are within the external schema path, preventing accessing files elsewhere on the server. Also, if foreign schema limit to or except clauses are used specifiying table names, only these names will be considered in the tables_map.

Tables created through the tables_map options are identical as if they were created with a create foreign table command with a filename option. All other options from the import foreign schema are transmitted to the create foreign table.

Each table must have at least one Parquet file when created, to query Parquet field and map them to colomun names. Of course, files globbing is evaluated at query-time, so one can add files to a directory and have them considered in the query results.

This branch is based on code in #96, so working with PostgreSQL 17. It also adds support for UUID columns.

pmetras and others added 15 commits August 26, 2025 08:41
Add synthetic documentation link
- Add PG 18 and 17 to CI test matrix, drop EOL versions 10-11
- Update Dockerfile to use bookworm base image
- Add PG_MODULE_MAGIC_EXT for PG 18+ to report extension name/version
- Document supported PostgreSQL versions in README
- Fix create_foreignscan_path API change (new disabled_nodes parameter)
- Fix ExplainState/ExplainPropertyText header changes in PG 18
- Update test expected output for new debug messages

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fixes:
- Replace strcpy() with memcpy() to prevent buffer overflow
- Use NAMEDATALEN instead of hardcoded buffer size
- Add null termination after strncpy() calls
- Add path character validation to reject control characters

New security feature:
- Add parquet_fdw.allowed_directories GUC variable
- Only superusers can access files when allowed_directories is empty
- Non-superusers can only access files within allowed directories
- Paths are canonicalized with realpath() to prevent symlink bypasses
- Document security model in README

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix heap.hpp: use delete[] for array deallocation (was using delete)
- Remove duplicate exec_state_corrected_ver.cpp file
- Remove stale TODO comments in exec_state.cpp
- Update parquet::arrow::FileReader::Make to use arrow::Result API
  (deprecated in Arrow 23.0.0)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- heap.hpp: Add assert() for bounds checking in append() and head()
- parquet_impl.cpp: Use lstat() fallback when d_type == DT_UNKNOWN
  (fixes directory scanning on filesystems that don't set d_type)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Cache UUID type detection in TypeInfo struct to avoid per-row checks
- Add FIXED_SIZE_BINARY handling in bytes_to_postgres_type() for UUID
  filtering support (fixes crash when filtering on UUID columns)
- Add UUID test data and regression tests for UUID columns
- Update test data generator with explicit schemas to avoid large_string
  type (not supported by parquet_fdw)
- Add requirements.txt and update Dockerfile for test data generation
- Add .venv to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@socket-security
Copy link

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpandas@​3.0.07610010010080
Addedpyarrow@​23.0.089100100100100

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant