Skip to content

Conversation

@stephTchembeu
Copy link

No description provided.

dataset.metadata.date + frame.timestamp
)
# Period
frame_data["period"] = periods.get(frame.period.id, "unknownn")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "unknownn" to "MISSING_MANDATORY_PERIOD_ID"

Orientation,
BallState,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a check to warn the user if they are progressing with only_alive=True in their kloppy dataset, because CDF expects all frames.

        if all([True for x in tracking_dataset if x.ball_state == BallState.ALIVE]):
            warnings.warn(
                "All frames in 'tracking_dataset' are 'ALIVE', the Commond Data Format expects 'DEAD' frames as well. Set `only_alive=False` in your kloppy `.load_tracking()` call to include 'DEAD' frames.",
                UserWarning,
            )


# Teams and players
home_players = []
for player, coordinates in frame.players_coordinates.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicated code (for home and away players) to loop over players needs to be a function.

}
)
except KeyError:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What KeyError are we handling here? Should be handled without exception


# teams within the tracking data.

home_players_id = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this loop for?

"formation": (
home_team.formations.at_start()
if home_team.formations.items
else self.get_starting_formation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small mistake in the common-data-format-validator, only "id" and "players" are mandatory here. So, we can remove "jersey_color" and "formation" as keys on the frame by frame level. If "home_team.name" is not None, we can add it too, but it's also not required.

away_team.formations.at_start()
if away_team.formations.items
else self.get_starting_formation(
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

starters_ids.append(player.player_id)

for player in home_team.players:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we catching KeyErrors here?


meta_away_players = []
for player in away_team.players:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, again, these two operations for home and away are the same, we can make a function out of this

# Add to tracking list
outputs.tracking_data.append(frame_file)

###################### build now the metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create two separate functions in the serializer.py, one for generating meta data and one for tracking data to clean it up.

@UnravelSports
Copy link
Contributor

@koenvo how do we want to make this functionality ultimately available to the end user? It should be captured in some type of common_data_format.store(dataset, file_path) type operation, right?

@koenvo
Copy link
Contributor

koenvo commented Oct 28, 2025

@koenvo how do we want to make this functionality ultimately available to the end user? It should be captured in some type of common_data_format.store(dataset, file_path) type operation, right?

Yes, or we can start with a helper like:

dataset.to_cdf(file_path)

...


def to_cdf(self, file_path):
    # load and save?
    from common_data_format import save
    save(self, file_path)

@UnravelSports
Copy link
Contributor

@koenvo how do we want to make this functionality ultimately available to the end user? It should be captured in some type of common_data_format.store(dataset, file_path) type operation, right?

Yes, or we can start with a helper like:

dataset.to_cdf(file_path)

...


def to_cdf(self, file_path):
    # load and save?
    from common_data_format import save
    save(self, file_path)

Oh, yeah I thought you were against doing the to_cdf option, I definitely prefer this as well.

@koenvo
Copy link
Contributor

koenvo commented Oct 28, 2025

@koenvo how do we want to make this functionality ultimately available to the end user? It should be captured in some type of common_data_format.store(dataset, file_path) type operation, right?

Yes, or we can start with a helper like:

dataset.to_cdf(file_path)

...


def to_cdf(self, file_path):
    # load and save?
    from common_data_format import save
    save(self, file_path)

Oh, yeah I thought you were against doing the to_cdf option, I definitely prefer this as well.

I believe the initial idea was to put the actual implementation into the to_cdf method, and that isn't the best idea.

UnravelSports [JB] and others added 4 commits October 30, 2025 14:44
This adds comprehensive write support to the open_as_file() function with
efficient memory management and streaming capabilities.

Key features:
- BufferedStream: SpooledTemporaryFile wrapper with chunked I/O (5MB memory threshold)
- Write modes: 'wb' (write), 'ab' (append) - binary only
- Adapter pattern: write_from_stream() method (opt-in for adapters)
- Compression support: .gz, .bz2, .xz files handled automatically
- Local files and S3 URIs supported via FSSpecAdapter
- Protocols for type safety: SupportsRead, SupportsWrite

Implementation details:
- read_from()/write_to() methods use shutil.copyfileobj for chunked copying
- Context manager pattern buffers writes and flushes on exit
- No breaking changes to existing read functionality
@UnravelSports UnravelSports added this to the 3.19.0 milestone Nov 18, 2025
@UnravelSports UnravelSports mentioned this pull request Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants