Overview
The data pipeline is being incrementally refactored to use Polars for improved performance and efficiency. Parts of the system still expect stream objects, which creates friction during the migration. Introducing two conversion utilities will allow both formats to coexist smoothly, ensuring a controlled and low risk transition.
Assumptions
Data validation is managed in previous phases.
Tech Approach
- Create a utility class that takes a Python stream object and returns a Polars DataFrame using standard Polars constructors.
- Create a second utility class that converts a Polars DataFrame back to a Python stream, preserving types and nested structures where possible.
- Ensure both utilities include simple validation and logging so that unexpected field structures can be identified early.
- Provide internal documentation explaining the expected input and output shapes for each utility.
- Relevant links for guidance:
Polars DataFrame documentation: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html
Polars conversion functions overview: https://pola-rs.github.io/polars/py-polars/html/reference/api/index.html
Acceptance Criteria / Tests
Resourcing and Dependencies
- No prerequisite tickets are required, although parallel work on pipeline refactoring may influence timelines.
- Any engineer familiar with the data pipeline and Polars can complete this ticket.
- No dependencies on external teams, although the Data Engineering team should be informed once the utilities are ready for adoption in the migration work.
Overview
The data pipeline is being incrementally refactored to use Polars for improved performance and efficiency. Parts of the system still expect stream objects, which creates friction during the migration. Introducing two conversion utilities will allow both formats to coexist smoothly, ensuring a controlled and low risk transition.
Assumptions
Data validation is managed in previous phases.
Tech Approach
Polars DataFrame documentation: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html
Polars conversion functions overview: https://pola-rs.github.io/polars/py-polars/html/reference/api/index.html
Acceptance Criteria / Tests
Resourcing and Dependencies