Skip to content

[pull] main from apache:main#109

Merged
pull[bot] merged 7 commits intoburaksenn:mainfrom
apache:main
Apr 16, 2026
Merged

[pull] main from apache:main#109
pull[bot] merged 7 commits intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 16, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

SubhamSinghal and others added 7 commits April 16, 2026 01:45
## Which issue does this PR close?

  - Related to #6899.

  ## Rationale for this change

Queries like `SELECT *, ROW_NUMBER() OVER (PARTITION BY pk ORDER BY val)
as rn FROM t WHERE rn <= K` are extremely common in analytics ("top N
per group"). The current plan sorts the **entire** dataset O(N log N),
computes ROW_NUMBER for all rows, then filters. With 10M rows, 1K
partitions, and K=3, we sort all 10M rows but only keep 3K.

This PR introduces a `PartitionedTopKExec` operator that replaces the
`SortExec`, maintaining a per-partition `TopK` heap (reusing
DataFusion's existing `TopK` implementation). Cost drops to O(N log K)
time and O(K × P × row_size) memory.

  ## What changes are included in this PR?

**New physical operator: `PartitionedTopKExec`**
(`physical-plan/src/sorts/partitioned_topk.rs`)
- Reads unsorted input, groups rows by partition key using
`RowConverter`, feeds sub-batches to a per-partition `TopK` heap
- Emits only the top-K rows per partition in sorted `(partition_keys,
order_keys)` order
- Reuses the existing `TopK` implementation for heap management, sort
key comparison, eviction, and batch compaction

**New optimizer rule: `WindowTopN`**
(`physical-optimizer/src/window_topn.rs`)

  Detects the pattern:
  ```text
  FilterExec(rn <= K)
    [optional ProjectionExec]
      BoundedWindowAggExec(ROW_NUMBER PARTITION BY ... ORDER BY ...)
        SortExec(partition_keys, order_keys)
  ```

  And replaces it with:
  ```text
  [optional ProjectionExec]
    BoundedWindowAggExec(ROW_NUMBER PARTITION BY ... ORDER BY ...)
      PartitionedTopKExec(fetch=K)
  ```

  Both `FilterExec` and `SortExec` are removed.

  Supported predicates: `rn <= K`, `rn < K`, `K >= rn`, `K > rn`.

The rule only fires for `ROW_NUMBER` with a `PARTITION BY` clause.
Global top-K (no `PARTITION BY`) is already handled by
  `SortExec` with `fetch`.

**Config flag:** `datafusion.optimizer.enable_window_topn` (default:
`true`)

 **Benchmark results** (H2O groupby Q8, 10M rows, top-2 per partition):

  cargo run --release --example h2o_window_topn_bench

  | Scenario | Enabled (ms) | Disabled (ms) | Speedup |
  |----------|-------------|--------------|---------|
  | 100 partitions (100K rows/part) | 43 | 174 | 4.0x |
  | 1K partitions (10K rows/part) | 71 | 146 | 2.1x |
  | 10K partitions (1K rows/part) | 619 | 128 | 0.2x (regression) |
  | 100K partitions (100 rows/part) | 4368 | 135 | 0.03x (regression) |

The 100K-partition regression is expected: per-partition `TopK` overhead
(RowConverter, MemoryReservation per instance)
dominates when partitions are very numerous with few rows each. For the
common case (moderate partition cardinality), the
  optimization provides 2-3x speedup.

  ## Are these changes tested?

  Yes:
- **7 unit tests** (`core/tests/physical_optimizer/window_topn.rs`):
basic ROW_NUMBER, `rn < K`, flipped predicates, non-window column
filter, config disabled, no partition by, projection between filter and
window
- **5 SLT tests** (`sqllogictest/test_files/window_topn.slt`):
correctness verification, EXPLAIN plan validation, `rn < K`,
no-partition-by case, config disabled fallback

  ## Are there any user-facing changes?

No breaking API changes. The optimization is disabled by default and
transparent to users. It can be enabled via:
  ```sql
  SET datafusion.optimizer.enable_window_topn = true;
  ```

---------

Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local>
Co-authored-by: Yongting You <2010youy01@gmail.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

`main` is not able to compile due to merge race by #21479 and #21573

This PR fixes the conflict

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…<83 in /docs (#21607)

Updates the requirements on
[setuptools](https://github.com/pypa/setuptools) to permit the latest
version.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's
changelog</a>.</em></p>
<blockquote>
<h1>v82.0.1</h1>
<h2>Bugfixes</h2>
<ul>
<li>Fix the loading of <code>launcher manifest.xml</code> file. (<a
href="https://redirect.github.com/pypa/setuptools/issues/5047">#5047</a>)</li>
<li>Replaced deprecated <code>json.__version__</code> with fixture in
tests. (<a
href="https://redirect.github.com/pypa/setuptools/issues/5186">#5186</a>)</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Add advice about how to improve predictability when installing
sdists. (<a
href="https://redirect.github.com/pypa/setuptools/issues/5168">#5168</a>)</li>
</ul>
<h2>Misc</h2>
<ul>
<li><a
href="https://redirect.github.com/pypa/setuptools/issues/4941">#4941</a>,
<a
href="https://redirect.github.com/pypa/setuptools/issues/5157">#5157</a>,
<a
href="https://redirect.github.com/pypa/setuptools/issues/5169">#5169</a>,
<a
href="https://redirect.github.com/pypa/setuptools/issues/5175">#5175</a></li>
</ul>
<h1>v82.0.0</h1>
<h2>Deprecations and Removals</h2>
<ul>
<li><code>pkg_resources</code> has been removed from Setuptools. Most
common uses of <code>pkg_resources</code> have been superseded by the
<code>importlib.resources
&lt;https://docs.python.org/3/library/importlib.resources.html&gt;</code>_
and <code>importlib.metadata
&lt;https://docs.python.org/3/library/importlib.metadata.html&gt;</code>_
projects. Projects and environments relying on
<code>pkg_resources</code> for namespace packages or other behavior
should depend on older versions of <code>setuptools</code>. (<a
href="https://redirect.github.com/pypa/setuptools/issues/3085">#3085</a>)</li>
</ul>
<h1>v81.0.0</h1>
<h2>Deprecations and Removals</h2>
<ul>
<li>Removed support for the --dry-run parameter to setup.py. This one
feature by its nature threads through lots of core and ancillary
functionality, adding complexity and friction. Removal of this parameter
will help decouple the compiler functionality from distutils and thus
the eventual full integration of distutils. These changes do affect some
class and function signatures, so any derivative functionality may
require some compatibility shims to support their expected interface.
Please report any issues to the Setuptools project for investigation.
(<a
href="https://redirect.github.com/pypa/setuptools/issues/4872">#4872</a>)</li>
</ul>
<h1>v80.10.2</h1>
<h2>Bugfixes</h2>
<ul>
<li>Update vendored dependencies. (<a
href="https://redirect.github.com/pypa/setuptools/issues/5159">#5159</a>)</li>
</ul>
<p>Misc</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pypa/setuptools/commit/5a13876673a41e3cd21d4d6e587f53d0fb4fd8e5"><code>5a13876</code></a>
Bump version: 82.0.0 → 82.0.1</li>
<li><a
href="https://github.com/pypa/setuptools/commit/51ab8f183f1c4112675d8d6ec6b004406d518ee8"><code>51ab8f1</code></a>
Avoid using (deprecated) 'json.<strong>version</strong>' in tests (<a
href="https://redirect.github.com/pypa/setuptools/issues/5194">#5194</a>)</li>
<li><a
href="https://github.com/pypa/setuptools/commit/f9c37b20bb0ed11203f676f9683452a4c3ace6f6"><code>f9c37b2</code></a>
Docs/CI: Fix intersphinx references (<a
href="https://redirect.github.com/pypa/setuptools/issues/5195">#5195</a>)</li>
<li><a
href="https://github.com/pypa/setuptools/commit/8173db2a4fc0f6cb28926b3dba59116b79f435c8"><code>8173db2</code></a>
Docs: Fix intersphinx references</li>
<li><a
href="https://github.com/pypa/setuptools/commit/09bafbc74923f2a3591b5b098be75d6af6ca5141"><code>09bafbc</code></a>
Fix past tense on newsfragment</li>
<li><a
href="https://github.com/pypa/setuptools/commit/461ea56c8e629819a23920f44d9298d4f041abde"><code>461ea56</code></a>
Add news fragment</li>
<li><a
href="https://github.com/pypa/setuptools/commit/c4ffe535b58235ff9f9ebe90d24a2cffb57e70ae"><code>c4ffe53</code></a>
Avoid using (deprecated) 'json.<strong>version</strong>' in tests</li>
<li><a
href="https://github.com/pypa/setuptools/commit/749258b1a96c7accc05ea7d842fb19fc378866fe"><code>749258b</code></a>
Cleanup <code>pkg_resources</code> dependencies and configuration (<a
href="https://redirect.github.com/pypa/setuptools/issues/5175">#5175</a>)</li>
<li><a
href="https://github.com/pypa/setuptools/commit/2019c16701667db1010c62ec11c6ef78c2e58206"><code>2019c16</code></a>
Parse <code>ext-module.define-macros</code> from
<code>pyproject.toml</code> as list of tuples (<a
href="https://redirect.github.com/pypa/setuptools/issues/5169">#5169</a>)</li>
<li><a
href="https://github.com/pypa/setuptools/commit/b809c86a37d97fcce290d5f51d4c293ab40bc685"><code>b809c86</code></a>
Sync setuptools schema with validate-pyproject (<a
href="https://redirect.github.com/pypa/setuptools/issues/5157">#5157</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pypa/setuptools/compare/v82.0.0...v82.0.1">compare
view</a></li>
</ul>
</details>
<br />

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… in /docs (#21608)

Updates the requirements on [maturin](https://github.com/pyo3/maturin)
to permit the latest version.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pyo3/maturin/releases">maturin's
releases</a>.</em></p>
<blockquote>
<h2>v1.13.1</h2>
<h2>What's Changed</h2>
<ul>
<li>fix: fall back to placeholder for abi3 when found interpreters are
too old by <a
href="https://github.com/messense"><code>@​messense</code></a> in <a
href="https://redirect.github.com/PyO3/maturin/pull/3126">PyO3/maturin#3126</a></li>
</ul>
<p>See also v1.13.0 release highlight: <a
href="https://github.com/PyO3/maturin/releases/tag/v1.13.0">https://github.com/PyO3/maturin/releases/tag/v1.13.0</a></p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/PyO3/maturin/compare/v1.13.0...v1.13.1">https://github.com/PyO3/maturin/compare/v1.13.0...v1.13.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/PyO3/maturin/blob/main/Changelog.md">maturin's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>1.13.0</h2>
<ul>
<li>Fix: fall back to placeholder for abi3 when found interpreters are
too old (<a
href="https://redirect.github.com/pyo3/maturin/pull/3126">#3126</a>)</li>
</ul>
<h2>1.13.0</h2>
<ul>
<li>Refactor: unified interpreter resolution pipeline (<a
href="https://redirect.github.com/pyo3/maturin/pull/3032">#3032</a>)</li>
<li>Refactor: decompose large modules into focused submodules (<a
href="https://redirect.github.com/pyo3/maturin/pull/3052">#3052</a>)</li>
<li>Keep cargo build artifact at original path after staging (<a
href="https://redirect.github.com/pyo3/maturin/pull/3054">#3054</a>)</li>
<li>Fix <code>--strip</code> conflicting with
<code>--include-debuginfo</code> in develop (<a
href="https://redirect.github.com/pyo3/maturin/pull/3057">#3057</a>)</li>
<li>Fix abi3 wheel producing version-specific tags for CPython below
minimum (<a
href="https://redirect.github.com/pyo3/maturin/pull/3061">#3061</a>)</li>
<li>Generate-ci: use uv pip for pytest steps to fix local wheel
preference (<a
href="https://redirect.github.com/pyo3/maturin/pull/3063">#3063</a>)</li>
<li>Update reflink-copy to 0.1.29 to fix sparc Linux builds</li>
<li>Add <code>[tool.maturin.generate-ci.github]</code> config support
(<a
href="https://redirect.github.com/pyo3/maturin/pull/3066">#3066</a>)</li>
<li>Fix(sdist): handle parent workspaces and refactor sdist generation
(<a
href="https://redirect.github.com/pyo3/maturin/pull/3055">#3055</a>)</li>
<li>Test: refactor integration suite and switch mixed fixtures to cffi
(<a
href="https://redirect.github.com/pyo3/maturin/pull/3068">#3068</a>)</li>
<li>Fix <code>data</code> symlink permission handling (<a
href="https://redirect.github.com/pyo3/maturin/pull/3069">#3069</a>)</li>
<li>Fix: correct bugs in audit.rs typo and module_writer (<a
href="https://redirect.github.com/pyo3/maturin/pull/3070">#3070</a>)</li>
<li>Perf: use lazy-initialized regexes instead of per-call compilation
(<a
href="https://redirect.github.com/pyo3/maturin/pull/3071">#3071</a>)</li>
<li>Refactor: extract duplicated helpers and reduce code repetition (<a
href="https://redirect.github.com/pyo3/maturin/pull/3072">#3072</a>)</li>
<li>Refactor: split monster functions into focused methods (<a
href="https://redirect.github.com/pyo3/maturin/pull/3073">#3073</a>)</li>
<li>Refactor: improve type safety and API clarity (<a
href="https://redirect.github.com/pyo3/maturin/pull/3074">#3074</a>)</li>
<li>Refactor: cleanup anti-patterns (<a
href="https://redirect.github.com/pyo3/maturin/pull/3075">#3075</a>)</li>
<li>Refactor: decompose <code>build_context</code> into focused
submodules (<a
href="https://redirect.github.com/pyo3/maturin/pull/3076">#3076</a>)</li>
<li>Fix: skip legacy manylinux aliases not in PyPI allow-list (<a
href="https://redirect.github.com/pyo3/maturin/pull/3078">#3078</a>)</li>
<li>Fix: auto-generate <code>.def</code> file for zig + windows-gnu to
export <code>PyInit</code> symbol (<a
href="https://redirect.github.com/pyo3/maturin/pull/3079">#3079</a>)</li>
<li>Ci: upgrade run-on-arch-action to ubuntu24.04, add deadsnakes PPA
for newer Python (<a
href="https://redirect.github.com/pyo3/maturin/pull/3081">#3081</a>)</li>
<li>Fix: pass <code>-undefined dynamic_lookup</code> via
<code>CARGO_ENCODED_RUSTFLAGS</code> on macOS (<a
href="https://redirect.github.com/pyo3/maturin/pull/3083">#3083</a>)</li>
<li>Feat: add Profile-Guided Optimization (PGO) support (<a
href="https://redirect.github.com/pyo3/maturin/pull/3085">#3085</a>)</li>
<li>Respect <code>metadata_directory</code> in <code>build_wheel</code>
per PEP 517 (<a
href="https://redirect.github.com/pyo3/maturin/pull/3086">#3086</a>)</li>
<li>Update lddtree to 0.5.0</li>
<li>Fix cargo path with puccinialin for Windows (<a
href="https://redirect.github.com/pyo3/maturin/pull/3093">#3093</a>)</li>
<li>Update and pin cargo-cyclonedx to 0.5.9</li>
<li>Ci: improve GitHub Actions generation logic (<a
href="https://redirect.github.com/pyo3/maturin/pull/3097">#3097</a>)</li>
<li>Refactor: split BuildOptions and BuildContext into logical
sub-groups (<a
href="https://redirect.github.com/pyo3/maturin/pull/3098">#3098</a>)</li>
<li>Refactor: move subcommands to separate modules (<a
href="https://redirect.github.com/pyo3/maturin/pull/3099">#3099</a>)</li>
<li>Refactor: decouple build orchestration from BuildContext (<a
href="https://redirect.github.com/pyo3/maturin/pull/3100">#3100</a>)</li>
<li>Upgrade pyo3 to 0.28 (<a
href="https://redirect.github.com/pyo3/maturin/pull/3101">#3101</a>)</li>
<li>Fix: only enable include_debuginfo by default on Windows in develop
command</li>
<li>PyO3: Adds <code>--generate_stubs</code> build options (<a
href="https://redirect.github.com/pyo3/maturin/pull/3105">#3105</a>)</li>
<li>Fix: prevent panic when no interpreters match abi3 minimum version
(<a
href="https://redirect.github.com/pyo3/maturin/pull/3108">#3108</a>)</li>
<li>Refactor to store CPython ABI metadata in a struct combining two
enums (<a
href="https://redirect.github.com/pyo3/maturin/pull/3110">#3110</a>)</li>
<li>Refactor: introduce <code>WheelRepairer</code> trait (<a
href="https://redirect.github.com/pyo3/maturin/pull/3112">#3112</a>)</li>
<li>Feat: re-implement delocate for repairing macOS wheels (<a
href="https://redirect.github.com/pyo3/maturin/pull/3114">#3114</a>)</li>
<li>PyO3: Adds generate-stubs command (<a
href="https://redirect.github.com/pyo3/maturin/pull/3115">#3115</a>)</li>
<li>Feat: re-implement delvewheel for repairing Windows wheels (<a
href="https://redirect.github.com/pyo3/maturin/pull/3116">#3116</a>)</li>
<li>Add auditwheel Warn mode, default to Warn on macOS/Windows (<a
href="https://redirect.github.com/pyo3/maturin/pull/3121">#3121</a>)</li>
<li>Feat: Support large zip files (<a
href="https://redirect.github.com/pyo3/maturin/pull/3118">#3118</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/PyO3/maturin/commit/b27b7e126148f373573fb029b05367dc62a5a0e6"><code>b27b7e1</code></a>
Release v1.13.1</li>
<li><a
href="https://github.com/PyO3/maturin/commit/4a3df460277453c31115700d3f37f1ea1bca2075"><code>4a3df46</code></a>
fix: fall back to placeholder for abi3 when found interpreters are too
old (#...</li>
<li><a
href="https://github.com/PyO3/maturin/commit/e8ebb2f429f17141265837b0f7c874c75c30ca3b"><code>e8ebb2f</code></a>
Release v1.13.0 (<a
href="https://redirect.github.com/pyo3/maturin/issues/3124">#3124</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/1e5d362a0eecb62898a8dabc59005bf804fc14f9"><code>1e5d362</code></a>
feat: Support large zip files (<a
href="https://redirect.github.com/pyo3/maturin/issues/3118">#3118</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/062bea70d18612fb8f416220faf00ca4946b7e4d"><code>062bea7</code></a>
Add auditwheel Warn mode, default to Warn on macOS/Windows (<a
href="https://redirect.github.com/pyo3/maturin/issues/3121">#3121</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/70ea11202b8e62735316f535cec4f02c7ee3c3d2"><code>70ea112</code></a>
feat: re-implement delvewheel for repairing Windows wheels (<a
href="https://redirect.github.com/pyo3/maturin/issues/3116">#3116</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/83cb1851f1f51abf697736ea83b4aa7140e7206a"><code>83cb185</code></a>
PyO3: Adds generate-stubs command (<a
href="https://redirect.github.com/pyo3/maturin/issues/3115">#3115</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/ac062c379987e0884c2d1098cc803a9e2d292a1f"><code>ac062c3</code></a>
[pre-commit.ci] pre-commit autoupdate (<a
href="https://redirect.github.com/pyo3/maturin/issues/3117">#3117</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/a8393eb0304eb86479ef0b68f036abde8e0391b0"><code>a8393eb</code></a>
feat: re-implement delocate for repairing macOS wheels (<a
href="https://redirect.github.com/pyo3/maturin/issues/3114">#3114</a>)</li>
<li><a
href="https://github.com/PyO3/maturin/commit/d97bbd0a51003ad62ccfcb854ea0c6e9713c6cd2"><code>d97bbd0</code></a>
refactor: introduce <code>WheelRepairer</code> trait (<a
href="https://redirect.github.com/pyo3/maturin/issues/3112">#3112</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pyo3/maturin/compare/v1.11.0...v1.13.1">compare
view</a></li>
</ul>
</details>
<br />

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close?

N/A.

## Rationale for this change

While reviewing apache/datafusion-python#1461, I
noticed that an example in the `map_extract` function was wrong:

```sql
-- example
SELECT map_extract(MAP {'x': 10, 'y': NULL, 'z': 30}, 'y');
----
[]

-- datafusion
SELECT map_extract(MAP {'x': 10, 'y': NULL, 'z': 30}, 'y');
----
[NULL]
```

## What changes are included in this PR?

- Fixed the previous example.
- Also added a new example showing a `map_extract` on an empty key.

## Are these changes tested?

Yes.

## Are there any user-facing changes?

No.
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #21299.

## Rationale for this change

`median` could produce incorrect results for sliding window frames
because `MedianAccumulator::retract_batch` could skip values after
`swap_remove(i)`.

When an element was removed, the last element was moved into index `i`,
but the loop still incremented `i`. That meant the swapped-in value was
never checked, so not all requested values were retracted.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- fix `retract_batch` so the index is only advanced when no value is
removed
- add a slt case for `median` over a sliding `range` window

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

No.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
## Which issue does this PR close?

N/A

## Rationale for this change

Add new spark function:
https://spark.apache.org/docs/latest/api/sql/index.html#make_valid_utf8

## What changes are included in this PR?

- Implementation
- SLT tests

## Are these changes tested?

Yes, tests added as part of this PR.

## Are there any user-facing changes?

No, these are new function.

---------

Co-authored-by: Kazantsev Maksim <mn.kazantsev@gmail.com>
@pull pull bot locked and limited conversation to collaborators Apr 16, 2026
@pull pull bot added the ⤵️ pull label Apr 16, 2026
@pull pull bot merged commit bd2af68 into buraksenn:main Apr 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants