-
Notifications
You must be signed in to change notification settings - Fork 88
feat: initial implementation for rapidata #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3b350ba
f3134b8
8a081cc
a3ba5b1
f2b4a98
a772e7c
fdae70d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,53 @@ | ||||||
| # Copyright 2025 - Pruna AI GmbH. All rights reserved. | ||||||
| # | ||||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||||||
| # you may not use this file except in compliance with the License. | ||||||
| # You may obtain a copy of the License at | ||||||
| # | ||||||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||||||
| # | ||||||
| # Unless required by applicable law or agreed to in writing, software | ||||||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||
| # See the License for the specific language governing permissions and | ||||||
| # limitations under the License. | ||||||
|
|
||||||
|
|
||||||
| from abc import ABC, abstractmethod | ||||||
| from typing import Any | ||||||
|
|
||||||
|
|
||||||
| class AsyncEvaluationMixin(ABC): | ||||||
| """ | ||||||
| Mixin for metrics that submit to external evaluation services and retrieve results asynchronously. | ||||||
|
|
||||||
| Subclasses implement create_request() to set up an evaluation | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| (e.g., create a leaderboard) and retrieve_results() to retrieve | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| outcomes (e.g., standings from human evaluators). | ||||||
| """ | ||||||
|
|
||||||
| @abstractmethod | ||||||
| def create_async_request(self, *args, **kwargs) -> Any: | ||||||
| """ | ||||||
| Create/configure an evaluation request on the external service. | ||||||
|
|
||||||
| Parameters | ||||||
| ---------- | ||||||
| *args : | ||||||
| Variable length argument list. | ||||||
| **kwargs : | ||||||
| Arbitrary keyword arguments. | ||||||
| """ | ||||||
|
|
||||||
| @abstractmethod | ||||||
| def retrieve_async_results(self, *args, **kwargs) -> Any: | ||||||
| """ | ||||||
| Retrieve results from the external service. | ||||||
|
|
||||||
| Parameters | ||||||
| ---------- | ||||||
| *args : | ||||||
| Variable length argument list. | ||||||
| **kwargs : | ||||||
| Arbitrary keyword arguments. | ||||||
| """ | ||||||
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,62 @@ | ||||||||||||
| # Copyright 2025 - Pruna AI GmbH. All rights reserved. | ||||||||||||
| # | ||||||||||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||||
| # you may not use this file except in compliance with the License. | ||||||||||||
| # You may obtain a copy of the License at | ||||||||||||
| # | ||||||||||||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||
| # | ||||||||||||
| # Unless required by applicable law or agreed to in writing, software | ||||||||||||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||||
| # See the License for the specific language governing permissions and | ||||||||||||
| # limitations under the License. | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| from abc import ABC | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| class EvaluationContextMixin(ABC): | ||||||||||||
| """ | ||||||||||||
| Mixin for metrics that evaluate multiple models sequentially. | ||||||||||||
|
|
||||||||||||
| Provides a current_context property that tracks which model is being | ||||||||||||
| evaluated. Setting a new context triggers on_context_change(), which | ||||||||||||
| subclasses can override to reset state between models. | ||||||||||||
| """ | ||||||||||||
|
|
||||||||||||
| _current_context: str | None = None | ||||||||||||
|
|
||||||||||||
| @property | ||||||||||||
| def current_context(self) -> str | None: | ||||||||||||
| """ | ||||||||||||
| Return the current context. | ||||||||||||
|
|
||||||||||||
| Returns | ||||||||||||
| ------- | ||||||||||||
| str | None | ||||||||||||
| The current context. | ||||||||||||
| """ | ||||||||||||
| return self._current_context | ||||||||||||
|
|
||||||||||||
| @current_context.setter | ||||||||||||
| def current_context(self, value: str | None) -> None: | ||||||||||||
| """ | ||||||||||||
| Set the current context. | ||||||||||||
|
|
||||||||||||
| Parameters | ||||||||||||
| ---------- | ||||||||||||
| value : str | ||||||||||||
| The new context. | ||||||||||||
| """ | ||||||||||||
| self._current_context = value | ||||||||||||
| self.on_context_change() | ||||||||||||
|
Comment on lines
+52
to
+53
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you think about checking if the current_context value actually changed an only triggering then?
Suggested change
|
||||||||||||
|
|
||||||||||||
| def on_context_change(self) -> None: | ||||||||||||
| """Hook called when the context changes. Override to reset state.""" | ||||||||||||
| pass | ||||||||||||
|
|
||||||||||||
| def _require_context(self) -> None: | ||||||||||||
| """Raise if no context has been set.""" | ||||||||||||
| if self._current_context is None: | ||||||||||||
| raise ValueError("No context set. Set current_context first.") | ||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, so we already start with the extra seperation, nice!
@begumcig I know that we can also do something like.
evaluation = [ rapidata, vbench ] could be nice to already start structuring like this, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think vbench and rapidata have a lot of shared dependencies, so doesn't really make sense to me to group them together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it could be a shared dependency group for all evaluation metrics. You can add extras to extras, but perhaps you'd like to keep them seperate?