Conversation
tamilatiurina
commented
Feb 4, 2026
- Added needed dependencies for models to pyproject.toml
- Added models/model_abstract, models/model_xgboost where xgboost script is allocated that follow base model abstract class' methods
- Added tests/models/test_models_xgboost for testing train, finetune and predict methods of xgboost model
There was a problem hiding this comment.
Pull request overview
Adds an XGBoost-backed model implementation under alphapulse.models, wires up a new optional dependency extra (models), and introduces tests intended to validate train/finetune/predict behavior.
Changes:
- Add
ModelAbstractandModelXgboostimplementations undersrc/alphapulse/models/. - Add a new
modelsoptional dependency group (and lockfile updates) includingxgboost. - Add pytest coverage for XGBoost training/finetuning/prediction.
Reviewed changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Adds models optional dependencies for model-related packages. |
uv.lock |
Locks models extra deps, including xgboost and transitive packages. |
src/alphapulse/models/model_abstract.py |
Introduces a base abstract model interface. |
src/alphapulse/models/model_xgboost.py |
Adds XGBoost model wrapper implementing the abstract interface. |
src/alphapulse/models/__init__.py |
Exposes model classes from the alphapulse.models package. |
tests/models/test_models_xgboost.py |
Adds tests for training, finetuning, and prediction output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ROOT = Path(__file__).parent.parent.parent | ||
| TRAIN_DATA_PATH = ROOT / "data" / "v5.2" / "train.parquet" | ||
| FEATURES_JSON_PATH = ROOT / "data" / "v5.2" / "features.json" | ||
| TEST_DATA_PATH = ROOT / "data" / "v5.2" / "live.parquet" | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def test_data() -> tuple[pd.DataFrame, list[str]]: | ||
| """Load Numerai data""" | ||
| with open(FEATURES_JSON_PATH, encoding="utf-8") as f: | ||
| feature_metadata = json.load(f) | ||
| feature_cols = feature_metadata["feature_sets"]["small"] | ||
| target_cols = feature_metadata["targets"] | ||
| train = pd.read_parquet( | ||
| TRAIN_DATA_PATH, columns=["era"] + feature_cols + target_cols | ||
| ) | ||
| return train, feature_cols |
There was a problem hiding this comment.
These tests depend on local Numerai dataset files under data/v5.2/*, but the repository doesn’t include a data/ directory. As written, the test suite will fail on a clean checkout/CI. Consider replacing this with a small synthetic DataFrame fixture (or add minimal test fixtures to tests/ and load them from there).
| "subsample": 0.8, | ||
| "colsample_bytree": 0.8, | ||
| "lambda": 1, | ||
| "alpha": 0, |
There was a problem hiding this comment.
xgb_params does not set an objective, but the test asserts predictions are in [0, 1]. With XGBoost defaults (regression), predictions are not guaranteed to be bounded, so this assertion can be flaky/incorrect. Either set an objective that guarantees bounds (e.g., logistic) or relax the assertion to properties that always hold (shape, finite values, etc.).
| "alpha": 0, | |
| "alpha": 0, | |
| "objective": "binary:logistic", |
| raise NotImplementedError("Train method needs to be overriden") | ||
|
|
||
| @abstractmethod | ||
| def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: | ||
| """Finetune the trained model""" | ||
| raise NotImplementedError("Finetune method needs to be overriden") | ||
|
|
||
| @abstractmethod | ||
| def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series: | ||
| """Predict the result of the trained model""" | ||
| raise NotImplementedError("Predict method needs to be overriden") |
There was a problem hiding this comment.
Typos in error text: “overriden” should be “overridden” in these NotImplementedError messages.
| raise NotImplementedError("Train method needs to be overriden") | |
| @abstractmethod | |
| def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: | |
| """Finetune the trained model""" | |
| raise NotImplementedError("Finetune method needs to be overriden") | |
| @abstractmethod | |
| def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series: | |
| """Predict the result of the trained model""" | |
| raise NotImplementedError("Predict method needs to be overriden") | |
| raise NotImplementedError("Train method needs to be overridden") | |
| @abstractmethod | |
| def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: | |
| """Finetune the trained model""" | |
| raise NotImplementedError("Finetune method needs to be overridden") | |
| @abstractmethod | |
| def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series: | |
| """Predict the result of the trained model""" | |
| raise NotImplementedError("Predict method needs to be overridden") |
| models = [ | ||
| "xgboost", | ||
| "pandas", | ||
| "numpy", | ||
| ] |
There was a problem hiding this comment.
The standard dev workflow (uv sync --extra dev / uv run mypy / uv run pytest) will not install the new models extra, but the code and tests unconditionally import xgboost. To keep CI/local dev green, either add xgboost to the dev extra (or core deps), or make the xgboost import/tests conditional on the extra being installed.
| **kwargs: Any, | ||
| ) -> xgb.Booster: | ||
| if self.model is None: | ||
| raise RuntimeError("Train initial model") |
There was a problem hiding this comment.
The runtime error message here is ambiguous (“Train initial model”). Consider making it actionable, e.g., explicitly telling callers to call train() before finetune(), and ideally include the class/method name in the message for easier debugging.
| raise RuntimeError("Train initial model") | |
| raise RuntimeError( | |
| "ModelXgboost.finetune() requires an initial model. Call " | |
| "ModelXgboost.train() before finetune()." | |
| ) |
| import pandas as pd | ||
| import xgboost as xgb | ||
|
|
||
|
|
||
| class ModelAbstract(ABC): | ||
| """Abstract class for all models""" | ||
|
|
||
| @abstractmethod | ||
| def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: | ||
| """Initial training the model""" |
There was a problem hiding this comment.
ModelAbstract imports xgboost and returns xgb.Booster, which makes the abstract base class (and any import of alphapulse.models) require the optional models extra. This will break uv run mypy / uv run pytest in the default dev environment where xgboost isn’t installed. Consider removing the hard dependency from the abstract layer (e.g., return Any/a protocol, or use a TYPE_CHECKING import and a forward reference) so the package can be imported/type-checked without the optional extra.
| from .model_abstract import ModelAbstract | ||
| from .model_xgboost import ModelXgboost | ||
|
|
||
| __all__ = ["ModelAbstract", "ModelXgboost"] |
There was a problem hiding this comment.
This package __init__ eagerly imports ModelXgboost, which will raise ModuleNotFoundError for users who install the base package without the models extra (because xgboost is optional). Consider making alphapulse.models safe to import without xgboost (lazy/conditional import, or avoid exporting ModelXgboost at package import time).
| "pandas", | ||
| "numpy", | ||
| ] | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
The new models extra includes pandas, but pandas is already a core dependency. Keeping duplicates makes dependency intent unclear; consider limiting this extra to only what’s actually optional (likely just xgboost, and possibly numpy if not already required elsewhere).
| "pandas", | |
| "numpy", | |
| ] | |
| "numpy", | |
| ] |
| """Abstract class for all models""" | ||
|
|
||
| @abstractmethod | ||
| def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: |
There was a problem hiding this comment.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
| def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: | |
| def train( | |
| self, | |
| *args: Any, | |
| params: Any = None, | |
| num_boost_round: Any = None, | |
| **kwargs: Any, | |
| ) -> xgb.Booster: |
| raise NotImplementedError("Train method needs to be overriden") | ||
|
|
||
| @abstractmethod | ||
| def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: |
There was a problem hiding this comment.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.finetune matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.finetune matches the call.
| def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster: | |
| def finetune( | |
| self, | |
| params: Any = None, | |
| num_boost_round: int | None = None, | |
| *_args: Any, | |
| **_kwargs: Any, | |
| ) -> xgb.Booster: |