New scoring metrics for "cross_validation_across_time"#2
New scoring metrics for "cross_validation_across_time"#2sianna338 wants to merge 11 commits intoskjerns:mainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @sianna338, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the cross_validation_across_time function by introducing a flexible mechanism for evaluating classifier performance. Instead of being limited to accuracy, users can now specify a wide range of scoring metrics, either from scikit-learn or custom-defined, providing greater analytical depth and adaptability for various machine learning tasks.
Highlights
- Enhanced Scoring Metrics: The cross_validation_across_time function now supports various scoring algorithms for classifier performance evaluation.
- Flexible Metric Specification: Users can specify metrics using a string keyword referring to sklearn.metrics functions (e.g., "average_precision_score") or by providing a custom user-defined scoring function.
- New Parameters: New parameters metric, metric_kwargs, and proba have been added to control the scoring process, allowing for more granular control over how metrics are applied and what kind of input they receive.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to use custom scoring metrics in cross_validation_across_time. The changes are a good step forward, but I've found a few critical issues in the implementation. The new logic for handling custom metrics has several bugs related to using the correct data for calculation and handling the new proba flag. Additionally, using a mutable dictionary as a default argument for metric_kwargs is unsafe. My review includes suggestions to fix these issues to ensure the new feature is robust and correct.
decoding.py
Outdated
| preds = np.argmax(probas, -1) | ||
|
|
||
| accuracy = (preds == test_y[:, None]).mean(axis=0) | ||
|
|
||
| if metric == "accuracy": | ||
| accuracy = (preds == test_y[:, None]).mean(axis=0) | ||
| else: | ||
| # resolve metric function | ||
| if isinstance(metric, str): | ||
| if not hasattr(sk_metrics, metric): | ||
| raise ValueError(f"sklearn.metrics has no function named '{metric}'") | ||
| func = getattr(sk_metrics, metric) | ||
| elif callable(metric): | ||
| func = metric | ||
| else: | ||
| raise TypeError("metric must be 'accuracy', a sklearn.metrics name (str), or a callable.") | ||
|
|
||
| sig = inspect.signature(func) | ||
| # add any extra parameters that are not preds and data_y | ||
| if metric_kwargs: | ||
| kwargs = {} | ||
| for k, v in metric_kwargs.items(): | ||
| if k in sig.parameters: | ||
| kwargs[k] = v | ||
|
|
||
| # need to loop over timepoints | ||
| accuracy = np.zeros(time_max) | ||
| for t in list(range(0, time_max)): | ||
| accuracy[t] = func(data_y, all_probas[:,t], **kwargs) |
There was a problem hiding this comment.
This block for calculating metrics contains several critical issues:
- When
proba=False,probascontains predictions, butnp.argmaxis still called, which is incorrect. - The custom metric calculation is incorrect: it uses the full
data_yandall_probasinstead of the current fold'stest_yandprobas. - There's a potential
NameErrorforkwargsifmetric_kwargsis not provided.
Additionally, the unchanged line all_probas[idxs_test] = probas (line 274) will fail with proba=False due to a shape mismatch and needs to be fixed separately.
Here is a suggested refactoring for this block to address the issues within the changed lines:
if proba:
preds = np.argmax(probas, -1)
else:
preds = probas
if metric == "accuracy":
accuracy = (preds == test_y[:, None]).mean(axis=0)
else:
# resolve metric function
if isinstance(metric, str):
if not hasattr(sk_metrics, metric):
raise ValueError(f"sklearn.metrics has no function named '{metric}'")
func = getattr(sk_metrics, metric)
elif callable(metric):
func = metric
else:
raise TypeError("metric must be 'accuracy', a sklearn.metrics name (str), or a callable.")
# add any extra parameters that are not preds and data_y
kwargs = {}
if metric_kwargs:
sig = inspect.signature(func)
for k, v in metric_kwargs.items():
if k in sig.parameters:
kwargs[k] = v
# need to loop over timepoints
accuracy = np.zeros(time_max)
for t in range(time_max):
accuracy[t] = func(test_y, probas[:,t], **kwargs)
decoding.py
Outdated
| tmin=-0.1, tmax=0.5, sfreq=100, | ||
| return_probas=False, | ||
| return_probas=False, metric='accuracy', | ||
| metric_kwargs={}, proba=True, |
There was a problem hiding this comment.
Using a mutable object like a dictionary as a default argument is discouraged. If the dictionary is modified, the change will persist across calls. It's safer to default to None and create a new dictionary inside the function if needed. You would then need to add if metric_kwargs is None: metric_kwargs = {} at the beginning of the function body.
| metric_kwargs={}, proba=True, | |
| metric_kwargs=None, proba=True, |
add error message when given metric_kwargs don't match the input arguments required by the scoring function Co-authored-by: Simon Kern <14980558+skjerns@users.noreply.github.com>
input to function, default scoring function (top_1_accuracy)
|
@sianna338 let me know when I should review again, and please comment and/or mark the conversations above as "resolved", so I know that it has been taken care of |
…ion for the classifier needs
|
/gemini review |
|
/gemini summary |
There was a problem hiding this comment.
Code Review
This pull request introduces a flexible scoring mechanism to the cross_validation_across_time function, which is a great enhancement. However, the current implementation contains several critical bugs in the scoring logic within the cross-validation loop, which will lead to incorrect results. Specifically, the scoring is performed on the entire dataset instead of the hold-out test set for each fold. There are also some API design issues, such as a now-unused parameter, and minor code quality issues like duplicate imports. I've provided detailed comments and suggestions to address these critical issues and improve the overall code quality.
| if not needs_probas: | ||
| preds_idx = np.argmax(all_results, axis=2) | ||
| preds_lbl = labels[preds_idx] | ||
|
|
||
|
|
||
| # compute scoring metric | ||
| score = np.zeros(time_max) | ||
| for t in list(range(0, time_max)): | ||
| if needs_probas: | ||
| score[t] = func(data_y, all_results[:,t], **metric_kwargs) | ||
| else: | ||
| score[t] = func(data_y, preds_lbl[:,t], **metric_kwargs) |
There was a problem hiding this comment.
This block of code contains several critical bugs that will lead to incorrect results or runtime errors:
- UnboundVariable:
preds_lblis defined within anif not needs_probas:block but is used later in a separateelseblock. Ifneeds_probasisTrue, this will raise anUnboundLocalError. - Incorrect Data Scope for Predictions: Predictions are generated using
np.argmax(all_results, ...). Theall_resultsarray contains data from all folds processed so far, not just the current test fold. Predictions should be based onresults_swp, which holds the probabilities for the current fold. - Incorrect Scoring Data: The scoring function
funcis called withdata_y(all labels) andall_results. Scoring in cross-validation must be performed on the hold-out test set for the current fold, usingdata_y[idxs_test]and the corresponding predictions/probabilities fromresults_swp.
The suggested change restructures the logic to fix these issues by correctly scoping data to the current fold.
| if not needs_probas: | |
| preds_idx = np.argmax(all_results, axis=2) | |
| preds_lbl = labels[preds_idx] | |
| # compute scoring metric | |
| score = np.zeros(time_max) | |
| for t in list(range(0, time_max)): | |
| if needs_probas: | |
| score[t] = func(data_y, all_results[:,t], **metric_kwargs) | |
| else: | |
| score[t] = func(data_y, preds_lbl[:,t], **metric_kwargs) | |
| test_y = data_y[idxs_test] | |
| score = np.zeros(time_max) | |
| if needs_probas: | |
| # Score using probabilities for each time point | |
| for t in range(time_max): | |
| score[t] = func(test_y, results_swp[:, t], **metric_kwargs) | |
| else: | |
| # Convert probabilities to label predictions once | |
| preds_idx = np.argmax(results_swp, axis=2) | |
| preds_lbl = labels[preds_idx] | |
| # Score using predictions for each time point | |
| for t in range(time_max): | |
| score[t] = func(test_y, preds_lbl[:, t], **metric_kwargs) |
Summary of ChangesThis pull request significantly enhances the Highlights
Changelog
Activity
|
The function cross_validation_across_time can now use different scoring algorithms to determine classifier performance.
The parameter "metric" is added which accepts either a string keyword, refering to one of the scoring functions from sklearn.metrics, or alternatively a user-defined scoring function. Default is "accuracy".