Skip to content

update#66

Merged
calvinyeungck merged 2 commits intomasterfrom
calvin
Feb 16, 2026
Merged

update#66
calvinyeungck merged 2 commits intomasterfrom
calvin

Conversation

@calvinyeungck
Copy link
Member

This pull request introduces support for processing soccer event data from the "bepro" data provider, along with several related enhancements and minor fixes. The most significant changes are the addition of a new UIED_bepro processing function, integration of "bepro" into the preprocessing pipeline, and updates to test and output file naming.

Bepro Data Provider Integration:

  • Added a new function UIED_bepro to soccer_processing.py for processing and feature engineering of Bepro event data, including possession tracking, action grouping, goal detection, and various time/location-based features.
  • Integrated "bepro" as a supported provider in Soccer_event_data.preprocessing_single_df, calling the new UIED_bepro function when appropriate.
  • Updated an internal test call in process_single_match to use the "bepro" provider for soccer event data.

Testing and Output Improvements:

  • Added a test block in soccer_processing.py to process a Bepro CSV file and save the preprocessed output for verification.
  • Changed the output file name in a test for soccertrack data from test_load_function_sync.csv to load.csv for clarity and consistency.

Feature and Data Handling Adjustments:

  • Updated feature extraction in get_additional_features to use the filtered_event_types column instead of event_types, likely improving event type accuracy.

Versioning:

  • Bumped the package version from 0.1.41 to 0.1.44 in pyproject.toml.

@calvinyeungck calvinyeungck merged commit 5159ef9 into master Feb 16, 2026
3 checks passed
@github-actions
Copy link

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.10

@github-actions
Copy link

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.8

@github-actions
Copy link

Init Test Results 📝

  • Status: success
  • OS: Linux
  • Python: 3.9

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new preprocessing path for SoccerTrackv2 / BePro event data and wires it into the existing soccer event preprocessing pipeline.

Changes:

  • Added UIED_bepro event preprocessing/feature-engineering routine.
  • Enabled "bepro" as a supported provider in Soccer_event_data.preprocessing_single_df.
  • Adjusted SoccerTrack feature extraction to use filtered_event_types and tweaked local output filenames in __main__ blocks; bumped package version.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
pyproject.toml Bumps package version to 0.1.44.
preprocessing/sports/event_data/soccer/soccer_processing.py Introduces UIED_bepro and adds a __main__ example producing a preprocessed CSV.
preprocessing/sports/event_data/soccer/soccer_load_data.py Changes SoccerTrack additional-feature extraction to read filtered_event_types; renames an output CSV in __main__.
preprocessing/sports/event_data/soccer/soccer_event_class.py Adds "bepro" to preprocessing provider routing and updates the __main__ example to use it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

soccer_track_tracking_path="/data_pool_1/soccertrackv2/2024-03-18/Tracking/tracking.xml"
soccer_track_meta_path="/data_pool_1/soccertrackv2/2024-03-18/Tracking/meta.xml"
df_soccertrack=Soccer_event_data('soccertrack',soccer_track_event_path,
df_soccertrack=Soccer_event_data('bepro',soccer_track_event_path,
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the __main__ example, the variable is named df_soccertrack but the provider passed is 'bepro'. Renaming the variable (or updating the provider name) would avoid confusion when running/debugging this script.

Suggested change
df_soccertrack=Soccer_event_data('bepro',soccer_track_event_path,
df_soccertrack=Soccer_event_data('soccertrack',soccer_track_event_path,

Copilot uses AI. Check for mistakes.
Comment on lines +1830 to +1837
#round numerical columns to 4 decimal places (period, minute, second, X, Y)
df = df.round({"Period": 4, "Minute": 4, "Second": 4, "seconds": 4, "start_x": 4, "start_y": 4, "deltaX": 4, "deltaY": 4, "distance": 4, "dist2goal": 4, "angle2goal": 4})

df['team'] = df.team_id
df['Period'] = df.period
df['Minute'] = df.seconds // 60
df['Second'] = df.seconds % 60

Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UIED_bepro calls df.round({..."Period", "Minute", "Second"...}) before those columns are created (they’re assigned just below). With pandas this can raise a KeyError for missing columns and will at least make the rounding logic ineffective. Create Period/Minute/Second first (or round the existing period/seconds columns), then round.

Suggested change
#round numerical columns to 4 decimal places (period, minute, second, X, Y)
df = df.round({"Period": 4, "Minute": 4, "Second": 4, "seconds": 4, "start_x": 4, "start_y": 4, "deltaX": 4, "deltaY": 4, "distance": 4, "dist2goal": 4, "angle2goal": 4})
df['team'] = df.team_id
df['Period'] = df.period
df['Minute'] = df.seconds // 60
df['Second'] = df.seconds % 60
df['team'] = df.team_id
df['Period'] = df.period
df['Minute'] = df.seconds // 60
df['Second'] = df.seconds % 60
#round numerical columns to 4 decimal places (period, minute, second, X, Y)
df = df.round({"Period": 4, "Minute": 4, "Second": 4, "seconds": 4, "start_x": 4, "start_y": 4, "deltaX": 4, "deltaY": 4, "distance": 4, "dist2goal": 4, "angle2goal": 4})

Copilot uses AI. Check for mistakes.
event_type_list = []
for i in range(len(event_df)):
event_i = event_df.iloc[i].event_types
event_i = event_df.iloc[i].filtered_event_types
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_soccertrack.get_additional_features now reads event_df.iloc[i].filtered_event_types, but this column is not created anywhere in load_soccertrack (the event CSV is read as-is). If the input files only contain the previously used event_types column, this will raise an AttributeError. Consider falling back to event_types when filtered_event_types is missing, or ensure filtered_event_types is generated earlier in the loader.

Copilot uses AI. Check for mistakes.
Comment on lines +1536 to +1542
# Create 'action' column by concatenating 'event_type' and 'event_type_2'
df["success"]=df["event_type"].apply(
lambda x: 0 if ("Failed" in str(x) or "Missed" in str(x) or "OnTarget" in str(x) or "shot" in str(x)) else 1
)
df["action"] = df["event_type"] = (
df["event_type"]
.astype(str)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the action column is created by concatenating event_type and event_type_2, but the implementation instead strips suffixes from event_type and assigns the result to both action and event_type. This mismatch is confusing for future maintenance; either update the comment/docstring to describe the actual behavior or implement the concatenation as described (and avoid overwriting the original event_type if it’s still needed).

Copilot uses AI. Check for mistakes.
Comment on lines +1574 to +1575
if df.action.iloc[i] not in all_cation:
print(f"Warning: action {df.action.iloc[i]} was not found in the all action list")
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_cation is a list but is used for membership checks inside a per-row loop (if df.action.iloc[i] not in all_cation:). Converting it to a set once (and using that for membership tests) will significantly reduce overhead on large matches.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant