Skip to content

PG16/17 compatibility: fix create_foreignscan_path(), add NULL guards in parquetGetForeignPlan(), use public MemoryContext alloc API#94

Open
kimwon9984 wants to merge 4 commits intoadjust:masterfrom
kimwon9984:fix/pg17-create_foreignscan-and-null-guards
Open

PG16/17 compatibility: fix create_foreignscan_path(), add NULL guards in parquetGetForeignPlan(), use public MemoryContext alloc API#94
kimwon9984 wants to merge 4 commits intoadjust:masterfrom
kimwon9984:fix/pg17-create_foreignscan-and-null-guards

Conversation

@kimwon9984
Copy link

@kimwon9984 kimwon9984 commented Aug 15, 2025

Summary

  • Update "create_foreignscan_path(...)" call sites to PG16+ signature (adds required_outer, fdw_outerpath, fdw_restrictinfo).
  • Guard NULL best_path->fdw_private and iterate attrs_used only when set in parquetGetForeignPlan() to avoid SIGSEGV.
  • Use public MemoryContext alloc API (MemoryContextAlloc*) instead of context->methods->alloc(...) for ABI safety.

Why

  • On PG/EDB 17, the old function signature causes compilation errors.
  • During EXPLAIN/SELECT, a NULL fdw_private could trigger a SIGSEGV - resolved by adding guards.
  • Reliance on MemoryContext internal vtable is unstable across versions - unified to use public API instead.

Tested on

  • EDB 17.5 (server 17.5.0), Ubuntu 22.04, Arrow/Parquet 21.0.0 (SONAME 2100)

Before (repro)

  • EXPLAIN SELECT * FROM parquet_test_safe LIMIT 1; → SIGSEGV (bt shows parquetGetForeignPlan deref of NULL fdw_private)

After

  • No crash; EXPLAIN/SELECT works.

PG16/17: fix create_foreignscan_path signature, add NULL guards, use public MemoryContext alloc
@kimwon9984
Copy link
Author

I would appreciate it if you could review the ForeignScan creation and modification changes, as well as the PostgreSQL 17 compatibility updates, including the Null Guards.

Please review the changes for PG17 compatibility.
This update includes

  • Updating create_foreignscan usage for PG/EDB 17 to resolve compile errors.
  • Adding null guards for fdw_private to prevent SIGSEGV during EXPLAIN or SELECT.
  • Replacing direct MemoryContext vtable access with public API calls for version stability.

Thank you for checking!
Have a nice day~!

PG16/17: fix create_foreignscan_path signature, add NULL guards, use public MemoryContext alloc
@kimwon9984
Copy link
Author

*** Environment (Summary)

  1. EDB AS 17.5 (server 17.5.0) / Ubuntu 22.04
  2. Arrow/Parquet 21.0.0 (SONAME 2100)
  3. Modified files: src/common.cpp, src/exec_state.cpp to src/exec_state_corrected_ver.cpp, src/parquet_impl.cpp

*** Issues Observed

  • Build Errors
    src/common.cpp
    Used context->methods->alloc(...) → caused too few arguments and ABI differences in PG17.
    src/parquet_impl.cpp
    create_foreignscan_path(...) → missing arguments due to updated PG16+ API.

Warnings:
exec_state.cpp: non-void function without return.
common.cpp: uninitialized numeric variable.

  • Runtime Errors
    On a new server:
    CREATE EXTENSION → failed with libarrow.so.2100: cannot open shared object file (Arrow runtime missing).

SIGSEGV during EXPLAIN or SELECT:
In parquetGetForeignPlan(), best_path->fdw_private was NULL, leading to NULL dereference when iterating over attrs_used.

  • Changes Made
    PG16/17 compatibility: Updated create_foreignscan_path(...) calls with new arguments (required_outer, fdw_outerpath, fdw_restrictinfo).
    NULL guard: In parquetGetForeignPlan(), allocate ParquetFdwPlanState with palloc0 if fdw_private is NULL; only iterate attrs_used if present.
    Memory allocation API: Removed vtable-based alloc calls; replaced with MemoryContextAlloc* public API (version-macro wrapped).
    Warning fixes:
    Added return 0; to estimate_coord_size() in exec_state.cpp.
    Initialized numeric variable in common.cpp.

*** Test Case Results

  1. Create sample parquet file
    import pandas as pd, pyarrow as pa, pyarrow.parquet as pq

df = pd.DataFrame({"id":[1,2,3], "name":["Alice","Bob","Charlie"], "amount":[10.5,20.75,15.0]})
pq.write_table(pa.Table.from_pandas(df, preserve_index=False), "/tmp/sample.parquet", compression="SNAPPY")

  1. Setup FDW and table
    CREATE EXTENSION parquet_fdw;

CREATE SERVER parquet_srv FOREIGN DATA WRAPPER parquet_fdw;

CREATE FOREIGN TABLE parquet_test_safe (
id bigint, name text, amount double precision
)
SERVER parquet_srv
OPTIONS (filename '/tmp/sample.parquet');

  1. Execution (Before vs After patch)
    Before patch:
    -> EXPLAIN SELECT * FROM parquet_test_safe LIMIT 1; → SIGSEGV in parquetGetForeignPlan().

After patch:

Limit -> Foreign Scan on parquet_test_safe Reader: Single File Row groups: 1

SELECT * FROM parquet_test_safe;

id | name | amount
----+---------+--------
1 | Alice | 10.5
2 | Bob | 20.75
3 | Charlie | 15
(3 rows)

@pmetras
Copy link

pmetras commented Aug 26, 2025

When implementing this merge request, I get a could not load library "/usr/lib/postgresql/17/lib/parquet_fdw.so": /usr/lib/postgresql/17/lib/parquet_fdw.so: undefined symbol: MemoryContextAllocZeroAligned error. Though, ldd /usr/lib/postgresql/17/lib/parquet_fdw.so is correct.

Any hints how to tackle it?

UPDATE: Precisions on my environment

  • Ubuntu 24.04.3 under WSL2 / Win11
  • libparquet 21.0.0
  • libarrow 21.0.0

The symbol is imported into parquet_fdw.so:

$ nm -D /usr/lib/postgresql/17/lib/parquet_fdw.so | grep MemoryContextAlloc
                 U MemoryContextAllocZeroAligned

And the address is defined into postgres:

$ nm -D /usr/lib/postgresql/17/bin/postgres | grep MemoryContextAlloc
0000000000709de0 T MemoryContextAlloc
000000000070c790 T MemoryContextAllocAligned
000000000070c690 T MemoryContextAllocExtended
000000000070a0a0 T MemoryContextAllocHuge
0000000000709e00 T MemoryContextAllocZero
000000000070e800 T MemoryContextAllocationFailure

@pmetras
Copy link

pmetras commented Aug 26, 2025

SOLVED: A make clean solved my problem. I must have had some old object files in the cache.

NULL, /* required_outer */
NULL, /* fdw_outerpath */
NULL, /* fdw_restrictinfo */
(List *) private_parallel);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't it private_parallel_merge?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch,
thanks for pointing it out!

In this code path we are building a regular parallel path (not a Gather Merge),
so private_parallel is the correct argument here.
private_parallel_merge is only used when constructing a Gather Merge path.

I double-checked the planner hooks and ran tests with EXPLAIN (ANALYZE) on
parallel queries, and the plan is generated as expected.

So no change is needed in this spot~~!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, why creating the private_parallel_merge object and initializing it in the previous lines? In that case the whole if (is_multi && is_sorted) block for multifile merge parallel path is useless...

I don't totally understand the details of the parquetGetForeignPaths function, but from what I understand it creates different foreign scan paths depending on what is possible with the query for PostgreSQL SQL optimizer. And in this context it creates a path for parallel merge strategy that should use the private_parallel_merge object just created.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up.

The block initializing private_parallel_merge under (is_multi && is_sorted) is indeed intended for the Gather Merge parallel path.
The later call to create_foreignscan_path(..., private_parallel) is only reached when Gather Merge is not applicable, so private_parallel is correct there.

I tested both cases with EXPLAIN (ANALYZE)

  • For sorted multi-file scans, the Gather Merge path is generated and uses private_parallel_merge.
  • For other cases, the plan falls back to a regular parallel path with private_parallel.

So both objects are needed depending on the planner branch. I can add a short comment in the code to clarify this distinction if you think it would help~!

Have a nice day^^

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there is some misunderstanding on which call we are talking. Here is a screenshot that shows the diff of this merge and where I suspect a problem:
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question~!

The (is_multi && is_sorted) branch that initializes private_parallel_merge
is specifically for the Gather Merge parallel path.

This call here with (List *) private_parallel is the fallback when
Gather Merge doesn't apply, so it's expected to use private_parallel

I ran a couple of EXPLAIN (ANALYZE) tests to be sure

  • In the sorted multi-file case, the planner generates a Gather Merge path and
    uses private_parallel_merge.
  • In other cases, the planner falls back to a regular parallel path here with
    private_parallel.

So both variables are needed depending on which path the planner chooses.
Happy to add a short comment in the code if that makes things clearer

Have a good day 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants