Skip to content

Add Polars lazyframe support #1

@klovanone

Description

@klovanone

Problem

Currently when loading from a source the Polars eager API is used to load the full dataframe to memory, and then at each data manipulation statement this dataframe is modified in memory (similar to how Pandas works). Polars also features the lazy API which allows many data manipulation statements to be deferred and combined at a later stage. The lazy API is encouraged by the Polars developers where possible.

The benefits of this are:

  • slightly faster performance
  • much lower memory usage (i.e. only load the first few entries of a parquet file to get the starting timestamp instead of the full data).

Solution

Examine the load and data manipulation statements inside the Src and TimeSeriesFuser classes and see if they can be optimized to support both the eager and lazy Polars API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions