Skip to content

Conversation

@samuelwnaylor
Copy link
Collaborator

@samuelwnaylor samuelwnaylor commented Apr 3, 2025

The main change is that polars does not have the same concept of named/custom indexes like pandas, therefore the TimeStamp_StartFormat and TurbineName indexes are treated as columns.

The intention of this PR is to start the transition of replacing pandas with polars through the wind-up calculations, with the ultimate longer-term goal to speed up wind-up runs.

Benchmark Results

Note

Benchmark tests can be run using poe test-benchmark

The polars implementation in this PR, specifically in the function _filter_turbine_df_by_other_turbine_dfs, shows significant performance improvements:

Implementation Mean Time Std Dev Min Time Max Time Speedup
Pandas 37.9ms ±1.1ms 36.3ms 42.0ms 1.0x (baseline)
Polars 14.0ms ±0.9ms 13.3ms 17.7ms 2.70x faster

Key findings:

  • Polars is consistently 2.7x faster than pandas
  • More stable performance with lower variability (StdDev: 0.9ms vs 1.1ms)
  • Excellent consistency (IQR: 0.38ms vs 1.30ms for pandas)
  • Tested over 53 benchmark rounds for polars, 25 for pandas
Full benchmark output ``` --------------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_benchmark_polars_version 13.3168 (1.0) 17.6729 (1.0) 14.0306 (1.0) 0.9006 (1.0) 13.7265 (1.0) 0.3789 (1.0) 6;7 71.2730 (1.0) 53 1 test_benchmark_pandas_version 36.2638 (2.72) 41.9831 (2.38) 37.8548 (2.70) 1.1473 (1.27) 37.5767 (2.74) 1.2970 (3.42) 3;1 26.4167 (0.37) 25 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ```

@samuelwnaylor samuelwnaylor marked this pull request as ready for review April 3, 2025 12:38
@samuelwnaylor samuelwnaylor requested a review from aclerc as a code owner April 3, 2025 12:38
@samuelwnaylor samuelwnaylor force-pushed the polars branch 3 times, most recently from 1b875a5 to 547f4f4 Compare April 4, 2025 09:00
Comment on lines 973 to 995
"filter_all_test_wtgs_together SMV5 set 400 rows [2.9%] to NA\n",
"filter_all_test_wtgs_together SMV5 set 3284 rows [23.4%] to NA\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, these look rather different, have you looked in to that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. I've addressed this in 8675dad and looking at smarteole_example.ipynb cell 14 output confirms it is fixed. I've also added a test case around the function _filter_turbine_df_by_other_turbine_dfs, which uses an expected value calculated using the current main branch and Hill of Towie open data (because it's faster than the smarteole pytest.Fixture).

The error was because there are differences between pandas' isna() and Polars' is_nan():

  • Pandas isna() detects both NaN and None/NULL values. In pandas, these are often treated interchangeably as "missing data."
  • Polars is_nan() detects only NaN values (floating-point NaN). To check for null values in Polars, you need to use is_null() instead.

Copy link
Contributor

@aclerc aclerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this, I think polars could really help!

As per comment I am not sure if the logic is definitely the same as before. Could you add a test for _filter_turbine_df_by_other_turbine_dfs, ideally using the same input and results from a known case like smarteole?

@samuelwnaylor samuelwnaylor force-pushed the polars branch 2 times, most recently from 9106c17 to a04080b Compare December 2, 2025 18:51
@samuelwnaylor samuelwnaylor marked this pull request as draft December 2, 2025 18:52
@samuelwnaylor samuelwnaylor force-pushed the polars branch 3 times, most recently from 2e5c89c to 3c81936 Compare December 2, 2025 19:29
@samuelwnaylor samuelwnaylor requested a review from aclerc December 2, 2025 20:04
@samuelwnaylor samuelwnaylor changed the base branch from main to migrate-to-polars December 3, 2025 09:23
@samuelwnaylor samuelwnaylor marked this pull request as ready for review December 3, 2025 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants