[CDF] to_cdf #519

UnravelSports · 2025-12-17T10:24:32Z

This is a continuation of stephTchembeu#5 and #513. Due to all the misalignments it was easier to create a new PR.

Overview

This PR adds TrackingDataset.to_cdf() functionality. It includes @koenvo improved writing PR #515 and it includes the cleaned up version of work done by @stephTchembeu.

Basically, we can now output kloppy tracking data to the Common Data Format (Anzer et al. 2025).

from kloppy import skillcorner

dataset = skillcorner.load_open_data(only_alive=False)

dataset.to_cdf(
    metadata_output_file='output/metadata.json',
    tracking_output_file='output/tracking.jsonl'
)

Because kloppy does not process some mandatory values for the CDF (stadium id, competition id, season id, version (tracking) and collection timing) doing the above will throw some warnings, namely:

UserWarning: Missing mandatory ID at 'competition.id'. Currently replaced with the value 'MISSING_MANDATORY_VALUE'. Please provide the correct value to 'additional_metadata' to completely adhere to the CDF specification.

We can resolve this by passing additional_metadata to the to_cdf functionality, using the Common Data Format Validator TypedDicts (you don't have to use this, but it helps keep everything in the correct schema), like so:

from cdf.domain import CdfMetaDataSchema, Stadium, Competition, Season, Meta, Tracking

additional_meta_data = CdfMetaDataSchema(
    competition=Competition(
        id="COMP_123",
        name="Test Competition",
        format="league_20"
    ),
    season=Season(id="SEASON_2024", name="2024/25"),
    stadium=Stadium(
        id="STADIUM_456",
        name="Test Arena",
        turf="grass",
    ),
    meta=Meta(
        tracking=Tracking(
            version="2.0.0",
            name="TestTracker",
            fps=30,
            collection_timing="live"
        )
    )
)

We can then run:

from kloppy import skillcorner

dataset = skillcorner.load_open_data(only_alive=False)

dataset.to_cdf(
    metadata_output_file='output/metadata.json',
    tracking_output_file='output/tracking.jsonl',
    additional_metadata=additional_meta_data
)

This will now not throw any warnings and it should output the correct files.

Note: we set only_alive=True, because not doing so will also show a warning.

Common Data Format Validator

We have new unit tests that test the writing functionality, and tests that validate the output schema to the CDF using the common-data-format-validator. This is a development dependency. Note that if the CDF changes it's structure, these tests will fail on the kloppy side too. I can imagine this is not ideal, but not sure what to do about this. Any suggestions here are more than welcome.

Next Steps

I would like to continue with reading CDF tracking data and writing and reading CDF event data. Should I do this in a new PR, or shall I pile everything into this?

This adds comprehensive write support to the open_as_file() function with efficient memory management and streaming capabilities. Key features: - BufferedStream: SpooledTemporaryFile wrapper with chunked I/O (5MB memory threshold) - Write modes: 'wb' (write), 'ab' (append) - binary only - Adapter pattern: write_from_stream() method (opt-in for adapters) - Compression support: .gz, .bz2, .xz files handled automatically - Local files and S3 URIs supported via FSSpecAdapter - Protocols for type safety: SupportsRead, SupportsWrite Implementation details: - read_from()/write_to() methods use shutil.copyfileobj for chunked copying - Context manager pattern buffers writes and flushes on exit - No breaking changes to existing read functionality

Cleaned up CDF Serializer

stephTchembeu and others added 27 commits October 28, 2025 09:42

squash feta/to_cdf

e252c91

squash

1e2d6e6

cleaned up CDF Serializer

8eaafbb

WIP: add write support

fbf20c6

Follow output pattern for SportsCode

3e1073a

merge pr-515

e249156

working writer

f419105

cdf improve write

7993ec8

io

9e00f8e

improved (now complete) skillcorner position mapping

6b904b2

remove setup.py

87883a2

add import error

c731a76

improved tests, 2 providers

3100eb5

skillcorner additional test files

b72f03e

improved meta data

c4055e7

remove error, add warning

5ffa0d0

fix test

766e2f0

Merge pull request PySport#5 from UnravelSports/pr-513

769ab73

Cleaned up CDF Serializer

Merge branch 'master' into new/to_cdf

9ba3f97

failing tests

75da5d6

add common-data-format-validator as dev dependency

f9b7e84

start

31f0c26

ruff

7d2ee8d

uv

a954f49

uv

29ab3e1

fix?

e1bb431

UnravelSports changed the title ~~[test]~~ [CDF] to_cdf Dec 17, 2025

UnravelSports requested review from probberechts and removed request for probberechts December 17, 2025 11:58

UnravelSports requested review from koenvo and probberechts December 17, 2025 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CDF] to_cdf #519

[CDF] to_cdf #519

Uh oh!

UnravelSports commented Dec 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[CDF] to_cdf #519

Are you sure you want to change the base?

[CDF] to_cdf #519

Uh oh!

Conversation

UnravelSports commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Common Data Format Validator

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UnravelSports commented Dec 17, 2025 •

edited

Loading