Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 11, 2025

Rationale for this change

# TODO: This will not handle prohibited characters in nested field names

This was first introduced in 4a6a6cb but did not implement the logic to handle nested types.

What changes are included in this PR?

This PR implements the nested handling for the sanitising the field names ParquetWriter (flavor='spark')

Are these changes tested?

Unittests were added, and manually tested via:

pytest python/pyarrow/tests/parquet/test_basic.py -k 'sanitized_spark' -v

Are there any user-facing changes?

Yes. It sanitizes the field names when ParquetWriter is used with flavor set to 'spark'.

@github-actions
Copy link

⚠️ GitHub issue #48455 has been automatically assigned in GitHub to PR creator.

@HyukjinKwon HyukjinKwon marked this pull request as draft December 18, 2025 07:00
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 18, 2025
@HyukjinKwon HyukjinKwon marked this pull request as ready for review December 23, 2025 02:01
@HyukjinKwon
Copy link
Member Author

Should be ready for a look.

@HyukjinKwon
Copy link
Member Author

@BryanCutler are you around :-)? This is more for Spark so wondered if I can take a look from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant