Skip to content

Conversation

@UnravelSports
Copy link
Contributor

@UnravelSports UnravelSports commented Jun 2, 2025

This has been on my list for a long time...

This addition allows users to do tracking_dataset.to_df(orient="row") (default remains orient="column").

This will only work for TrackingDataset and it returns a DataFrame with the following columns:
[ "period_id", "timestamp", "frame_id", "ball_state", "ball_owning_team_id", "team_id", "player_id", "x", "y", "z", "d", "s"]

team_id and player_id are "ball" for the ball object.

Each key in frame.other_data gets their own column as well, currently that would only be "visible_area" if we convert StatsBomb Freeze Frames to a TrackingDataset as discussed in Issue #474.

This PR adds:

  • RowWiseFrameTransformer to kloppy/domain/services/transformers/attribute.py
  • to_dict_rowwise to the Dataset with an error if we try to run orient="row" on anything that is not a TrackingDataset
  • to_df update to allow for orient="row" and orient="column"
  • Some tests in test_sportec

Edit: I'm contemplating if we should change "row" to "rows" and "column" to "columns"

For future reference: Because of the janky player Ids in the StatsBomb freeze frames we can't convert StatsBomb TrackingData (created from freeze frames) into a to_df(orient="columns"). That's why we need orient="rows"

@UnravelSports UnravelSports changed the title [TRANSFORMERS] to_df(orient=row) [TRANSFORMERS] to_df(orient="rows") Jun 2, 2025
@koenvo koenvo added this to the 3.18.0 milestone Jun 5, 2025
@UnravelSports UnravelSports modified the milestones: 3.18.0, 3.19.0 Oct 21, 2025
@UnravelSports
Copy link
Contributor Author

As discussed on the Kloppy Dev call, I've changed orient to layout, "rows" to "long" and "columns" to "wide"

@probberechts
Copy link
Contributor

Close #68

@probberechts
Copy link
Contributor

I've significantly refactored this one. Main changes:

  • Removed the to_tracking_dataset implementation from this PR, since this is a different feature.
  • The core logic for the data_record transformers was in kloppy.domain.services.transformers.attribute. I don't know why. It obviously belongs in kloppy.domain.services.transformers.data_record
  • Changed the get_transformer_cls method to accept a layout parameter. This avoids a lot of duplicate code in the Dataset class.
  • Replaced DataRecordToDictTransformer by a more generic DataRecordTransformer class that allows transforming data records to any output type (not just a dict).
  • Things like if self.dataset_type != DatasetType.TRACKING: do not belong in the Dataset class implementation. Super classes should not contain logic that is specific to child classes.
  • Added documentation

Does this look good, @UnravelSports?

@UnravelSports
Copy link
Contributor Author

UnravelSports commented Dec 24, 2025

Thanks @probberechts!

from kloppy import skillcorner

dataset = skillcorner.load_open_data()
dataset.to_df(engine="polars", layout="long")

Throws an error because team_id is a mix of integers and string ('ball'), ball_owning_team_id is only int. There is two ways to handle this, convert all team_id's to string or setting strict=False in the conversion, but then we'll have the team_id column as mixed types. Not sure which is preferred, but in general I'm not a big fan of allowing team_id as int, because it always causes me functional issues.

We can do strict=False but I think we might have to throw a warning?

@koenvo any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants