Skip to content

Fix: Nested struct parsing fails to preserve nested fields (Issue #627)#628

Merged
laughingman7743 merged 1 commit intomasterfrom
fix/nested-struct-parsing-issue-627
Nov 17, 2025
Merged

Fix: Nested struct parsing fails to preserve nested fields (Issue #627)#628
laughingman7743 merged 1 commit intomasterfrom
fix/nested-struct-parsing-issue-627

Conversation

@laughingman7743
Copy link
Member

Summary

Fixes nested STRUCT (ROW) type parsing where nested fields were being lost during data conversion.

Problem

Issue #627 reported that when querying tables with nested structs like:

positions = Column(
    "positions",
    AthenaArray(
        AthenaStruct(
            ("header", AthenaStruct(("stamp", AthenaTimestamp))),
            ("x", Float),
            ("y", Float)
        )
    ),
)

The actual data {header={stamp=xyz, seq=123}, x=4.736, y=0.583} was being incorrectly parsed as {'x': 4.736, 'y': 0.583}, with the entire header field lost.

Root Cause

The _parse_named_struct function in pyathena/converter.py had two issues:

  1. Simple comma splitting - Used inner.split(",") which incorrectly split nested structures:

    Input: "header={stamp=2024-01-01, seq=123}, x=4.736"
    Wrong split: ["header={stamp=2024-01-01", " seq=123}", " x=4.736"]
    
  2. Brace filtering - Skipped any key-value pairs containing {} characters, removing all nested fields

Solution

  • Updated _parse_named_struct to use _split_array_items helper for proper brace-depth-aware splitting
  • Added recursive parsing: when a value looks like a struct ({...}), call _to_struct recursively
  • Updated docstring to document nested struct support

Changes

  • pyathena/converter.py: Modified _parse_named_struct function
  • tests/pyathena/test_converter.py: Added 10 test cases for nested structs
  • tests/pyathena/sqlalchemy/test_base.py: Added 2 integration tests with real Athena queries

Testing

All tests pass:

  • ✅ 71 converter tests (including 10 new nested struct tests)
  • ✅ 2 new SQLAlchemy integration tests with Athena queries
  • ✅ All existing tests pass without regression
  • ✅ Lint/format/type checks pass

Test Coverage

Converter tests (test_converter.py):

  • Single-level nesting: {header={stamp=..., seq=...}, x=..., y=...}
  • Double nesting: {outer={middle={inner=value}}}
  • Triple nesting: {level1={level2={level3=...}}}
  • Multiple nested fields: {pos={x, y}, vel={x, y}, timestamp=...}
  • Arrays with nested structs: [{header={...}, x=...}]

SQLAlchemy integration tests (test_base.py):

  • Query execution with nested ROW types
  • Array query with nested structs

Verification

# Before fix
>>> from pyathena.converter import _to_struct
>>> _to_struct("{header={stamp=2024-01-01, seq=123}, x=4.736}")
{'x': 4.736}  # header is lost!

# After fix
>>> _to_struct("{header={stamp=2024-01-01, seq=123}, x=4.736}")
{'header': {'stamp': '2024-01-01', 'seq': 123}, 'x': 4.736}  # ✓ Correct!

Fixes #627

🤖 Generated with Claude Code

This commit fixes a critical bug where nested STRUCT (ROW) types were
not being parsed correctly, causing nested fields to be lost during
data conversion.

## Problem
The `_parse_named_struct` function in `pyathena/converter.py` was using
simple comma-splitting which failed for nested structures like:
`{header={stamp=2024-01-01, seq=123}, x=4.736}`

This caused:
1. Incorrect splitting at commas inside nested braces
2. Nested fields being skipped due to brace-containing value filtering

## Solution
- Updated `_parse_named_struct` to use `_split_array_items` for
  proper brace-depth-aware splitting
- Added recursive parsing for nested struct values
- Updated docstring to document nested struct support

## Testing
Added comprehensive test cases:
- Converter tests: 7 nested struct patterns + 3 array patterns
- SQLAlchemy integration tests: Query execution with nested ROW types

All existing tests pass without regression.

Fixes #627

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@laughingman7743 laughingman7743 marked this pull request as ready for review November 17, 2025 13:18
@laughingman7743 laughingman7743 merged commit 8d4e41c into master Nov 17, 2025
5 checks passed
@laughingman7743 laughingman7743 deleted the fix/nested-struct-parsing-issue-627 branch November 17, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Undesired behaviour with AthenaArray/AthenaStruct

1 participant