PG16/17 compatibility: fix create_foreignscan_path(), add NULL guards in parquetGetForeignPlan(), use public MemoryContext alloc API#94
Conversation
PG16/17: fix create_foreignscan_path signature, add NULL guards, use public MemoryContext alloc
|
I would appreciate it if you could review the ForeignScan creation and modification changes, as well as the PostgreSQL 17 compatibility updates, including the Null Guards. Please review the changes for PG17 compatibility.
Thank you for checking! |
PG16/17: fix create_foreignscan_path signature, add NULL guards, use public MemoryContext alloc
|
*** Environment (Summary)
*** Issues Observed
Warnings:
SIGSEGV during EXPLAIN or SELECT:
*** Test Case Results
df = pd.DataFrame({"id":[1,2,3], "name":["Alice","Bob","Charlie"], "amount":[10.5,20.75,15.0]})
CREATE SERVER parquet_srv FOREIGN DATA WRAPPER parquet_fdw; CREATE FOREIGN TABLE parquet_test_safe (
After patch: Limit -> Foreign Scan on parquet_test_safe Reader: Single File Row groups: 1SELECT * FROM parquet_test_safe; id | name | amount |
|
When implementing this merge request, I get a could not load library "/usr/lib/postgresql/17/lib/parquet_fdw.so": /usr/lib/postgresql/17/lib/parquet_fdw.so: undefined symbol: MemoryContextAllocZeroAligned error. Though, Any hints how to tackle it? UPDATE: Precisions on my environment
The symbol is imported into And the address is defined into |
|
SOLVED: A |
| NULL, /* required_outer */ | ||
| NULL, /* fdw_outerpath */ | ||
| NULL, /* fdw_restrictinfo */ | ||
| (List *) private_parallel); |
There was a problem hiding this comment.
Good catch,
thanks for pointing it out!
In this code path we are building a regular parallel path (not a Gather Merge),
so private_parallel is the correct argument here.
private_parallel_merge is only used when constructing a Gather Merge path.
I double-checked the planner hooks and ran tests with EXPLAIN (ANALYZE) on
parallel queries, and the plan is generated as expected.
So no change is needed in this spot~~!
There was a problem hiding this comment.
In that case, why creating the private_parallel_merge object and initializing it in the previous lines? In that case the whole if (is_multi && is_sorted) block for multifile merge parallel path is useless...
I don't totally understand the details of the parquetGetForeignPaths function, but from what I understand it creates different foreign scan paths depending on what is possible with the query for PostgreSQL SQL optimizer. And in this context it creates a path for parallel merge strategy that should use the private_parallel_merge object just created.
There was a problem hiding this comment.
Thanks for the follow-up.
The block initializing private_parallel_merge under (is_multi && is_sorted) is indeed intended for the Gather Merge parallel path.
The later call to create_foreignscan_path(..., private_parallel) is only reached when Gather Merge is not applicable, so private_parallel is correct there.
I tested both cases with EXPLAIN (ANALYZE)
- For sorted multi-file scans, the Gather Merge path is generated and uses private_parallel_merge.
- For other cases, the plan falls back to a regular parallel path with private_parallel.
So both objects are needed depending on the planner branch. I can add a short comment in the code to clarify this distinction if you think it would help~!
Have a nice day^^
There was a problem hiding this comment.
Good question~!
The (is_multi && is_sorted) branch that initializes private_parallel_merge
is specifically for the Gather Merge parallel path.
This call here with (List *) private_parallel is the fallback when
Gather Merge doesn't apply, so it's expected to use private_parallel
I ran a couple of EXPLAIN (ANALYZE) tests to be sure
- In the sorted multi-file case, the planner generates a Gather Merge path and
uses private_parallel_merge. - In other cases, the planner falls back to a regular parallel path here with
private_parallel.
So both variables are needed depending on which path the planner chooses.
Happy to add a short comment in the code if that makes things clearer
Have a good day 🙂

Summary
Why
Tested on
Before (repro)
After