Add support for decimal types by brendanyounger · Pull Request #70 · adjust/parquet_fdw

brendanyounger · 2023-04-25T14:39:07Z

Can now read parquet files with decimal types.

za-arthur

Thank you @brendanyounger for the new feature. Could you also add tests and fix README?

za-arthur · 2023-04-28T16:18:28Z

src/common.cpp

            return TIMESTAMPOID;
        case arrow::Type::DATE32:
            return DATEOID;
+        case arrow::Type::DECIMAL:


Shouldn't be DECIMAL128 and DECIMAL256 used here similar to bytes_to_postgres_type?

Magmatrix · 2025-03-05T15:08:42Z

@brendanyounger Hi, I tested this in PG 15 but couldn't get it to work fully.

I have tried creating the foreign table with price column of type NUMERIC, and later with type FLOAT. Regardless of the type used, price always show up as 0. Example:

parquet=# select price,price_m,price_e from "Order" limit 10;
 price | price_m | price_e
-------+---------+---------
     0 |    6000 |      -3
     0 |    2320 |      -3
     0 |   10000 |      -3
     0 |    1530 |      -3
     0 |   66200 |      -3
     0 |    3000 |      -3
     0 |    4010 |      -3
     0 |     232 |      -3
     0 |    7420 |      -3
     0 |  291940 |      -3
(10 rows)

However, if I cast it from NUMERIC to FLOAT (or vice versa), all price columns EXCEPT the first row show up correctly:

parquet=# select price::NUMERIC,price_m,price_e from "Order" limit 10;
 price  | price_m | price_e
--------+---------+---------
      0 |    6000 |      -3
   2.32 |    2320 |      -3
     10 |   10000 |      -3
   1.53 |    1530 |      -3
   66.2 |   66200 |      -3
      3 |    3000 |      -3
   4.01 |    4010 |      -3
  0.232 |     232 |      -3
   7.42 |    7420 |      -3
 291.94 |  291940 |      -3
(10 rows)

The cast obviously triggers something (after the first row), but I have no idea where to start looking for this bug. Any ideas?

Magmatrix · 2025-03-05T15:39:17Z

NB: If I count the number of orders with price==0 this happens:

parquet=# select count(*) from "Order" where price = 0;
ERROR:  parquet_fdw: failed to extract row groups from Parquet file: row group filter match failed: cache lookup failed for function 0 ('/srv/parquet-test/Order/2025/02/10/10/Order.parquet')

But with a cast, it finds 105 cases where the price wrongly shows up as 0:

parquet=# select count(*) from "Order" where price::numeric = 0;
 count
-------
   105

This table is constructed from exactly 105 files, so it's the first price in each file that gets misinterpreted when using a cast (or every row if not using a cast). I guess this might help in finding the bug.

Magmatrix · 2025-03-05T15:43:43Z

And a bit more testing shows that if you add another cast, then the results becomes random (different results every time):

parquet=# select count(*) from "Order" where price::numeric::float = 0;
 count
--------
 477094
(1 row)

parquet=# select count(*) from "Order" where price::numeric::float = 0;
 count
--------
 459599
(1 row)

parquet=# select count(*) from "Order" where price::numeric::float <> 0;
 count
--------
 405207
(1 row)

parquet=# select count(*) from "Order" where price::numeric::float <> 0;
 count
--------
 427107
(1 row)

MisterRaindrop · 2025-03-24T03:28:42Z

src/common.cpp

                                   (UNIX_EPOCH_JDATE - POSTGRES_EPOCH_JDATE));
+        case arrow::Type::DECIMAL128: {
+            auto dectype = (arrow::Decimal128Type *)arrow_type;
+            std::string val = arrow::Decimal128(bytes).ToString(dectype->scale());


Converting the type to a string and then processing it with numeric_in is relatively inefficient, right? At least in our tests, it is significantly slower than Spark’s parsing. Are there better solutions to handle this type of conversion?

Add support for decimal types

3e89fb8

za-arthur added the enhancement New feature or request label Apr 28, 2023

za-arthur suggested changes Apr 28, 2023

View reviewed changes

MisterRaindrop reviewed Mar 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for decimal types#70

Add support for decimal types#70
brendanyounger wants to merge 1 commit intoadjust:masterfrom
brendanyounger:decimal-types

brendanyounger commented Apr 25, 2023

Uh oh!

za-arthur left a comment

Uh oh!

za-arthur Apr 28, 2023

Uh oh!

Magmatrix commented Mar 5, 2025

Uh oh!

Magmatrix commented Mar 5, 2025

Uh oh!

Magmatrix commented Mar 5, 2025 •

edited

Loading

Uh oh!

MisterRaindrop Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

brendanyounger commented Apr 25, 2023

Uh oh!

za-arthur left a comment

Choose a reason for hiding this comment

Uh oh!

za-arthur Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

Magmatrix commented Mar 5, 2025

Uh oh!

Magmatrix commented Mar 5, 2025

Uh oh!

Magmatrix commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MisterRaindrop Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Magmatrix commented Mar 5, 2025 •

edited

Loading