Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ eda = [
"pyarrow"
]

models = [
"xgboost",
"pandas",
"numpy",
]
Comment on lines +65 to +69
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard dev workflow (uv sync --extra dev / uv run mypy / uv run pytest) will not install the new models extra, but the code and tests unconditionally import xgboost. To keep CI/local dev green, either add xgboost to the dev extra (or core deps), or make the xgboost import/tests conditional on the extra being installed.

Copilot uses AI. Check for mistakes.



Comment on lines +67 to 72
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new models extra includes pandas, but pandas is already a core dependency. Keeping duplicates makes dependency intent unclear; consider limiting this extra to only what’s actually optional (likely just xgboost, and possibly numpy if not already required elsewhere).

Suggested change
"pandas",
"numpy",
]
"numpy",
]

Copilot uses AI. Check for mistakes.
[project.urls]
Expand Down
4 changes: 4 additions & 0 deletions src/alphapulse/models/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .model_abstract import ModelAbstract
from .model_xgboost import ModelXgboost

__all__ = ["ModelAbstract", "ModelXgboost"]
Comment on lines +1 to +4
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package __init__ eagerly imports ModelXgboost, which will raise ModuleNotFoundError for users who install the base package without the models extra (because xgboost is optional). Consider making alphapulse.models safe to import without xgboost (lazy/conditional import, or avoid exporting ModelXgboost at package import time).

Copilot uses AI. Check for mistakes.
24 changes: 24 additions & 0 deletions src/alphapulse/models/model_abstract.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from abc import ABC, abstractmethod
from typing import Any

import pandas as pd
import xgboost as xgb


class ModelAbstract(ABC):
"""Abstract class for all models"""

@abstractmethod
def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.

Suggested change
def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
def train(
self,
*args: Any,
params: Any = None,
num_boost_round: Any = None,
**kwargs: Any,
) -> xgb.Booster:

Copilot uses AI. Check for mistakes.
"""Initial training the model"""
Comment on lines +4 to +13
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelAbstract imports xgboost and returns xgb.Booster, which makes the abstract base class (and any import of alphapulse.models) require the optional models extra. This will break uv run mypy / uv run pytest in the default dev environment where xgboost isn’t installed. Consider removing the hard dependency from the abstract layer (e.g., return Any/a protocol, or use a TYPE_CHECKING import and a forward reference) so the package can be imported/type-checked without the optional extra.

Copilot uses AI. Check for mistakes.
raise NotImplementedError("Train method needs to be overriden")

@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.finetune matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.finetune matches the call.

Suggested change
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
def finetune(
self,
params: Any = None,
num_boost_round: int | None = None,
*_args: Any,
**_kwargs: Any,
) -> xgb.Booster:

Copilot uses AI. Check for mistakes.
"""Finetune the trained model"""
raise NotImplementedError("Finetune method needs to be overriden")

@abstractmethod
def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series:
"""Predict the result of the trained model"""
raise NotImplementedError("Predict method needs to be overriden")
Comment on lines +14 to +24
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos in error text: “overriden” should be “overridden” in these NotImplementedError messages.

Suggested change
raise NotImplementedError("Train method needs to be overriden")
@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
"""Finetune the trained model"""
raise NotImplementedError("Finetune method needs to be overriden")
@abstractmethod
def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series:
"""Predict the result of the trained model"""
raise NotImplementedError("Predict method needs to be overriden")
raise NotImplementedError("Train method needs to be overridden")
@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
"""Finetune the trained model"""
raise NotImplementedError("Finetune method needs to be overridden")
@abstractmethod
def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series:
"""Predict the result of the trained model"""
raise NotImplementedError("Predict method needs to be overridden")

Copilot uses AI. Check for mistakes.
57 changes: 57 additions & 0 deletions src/alphapulse/models/model_xgboost.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
from collections.abc import Mapping
from typing import Any

import pandas as pd
import xgboost as xgb

from .model_abstract import ModelAbstract


class ModelXgboost(ModelAbstract):
def __init__(self) -> None:
self.model: xgb.Booster | None = None

def train(
self,
X: pd.DataFrame,
y: pd.Series,
params: Mapping[str, Any],
num_boost_round: int = 10,
**kwargs: Any,
) -> xgb.Booster:
dtrain = xgb.DMatrix(X, label=y)

self.model = xgb.train(
params=params, dtrain=dtrain, num_boost_round=num_boost_round, **kwargs
)
return self.model

def finetune(
self,
X: pd.DataFrame,
y: pd.Series,
params: Mapping[str, Any],
num_boost_round: int = 10,
**kwargs: Any,
) -> xgb.Booster:
if self.model is None:
raise RuntimeError("Train initial model")
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime error message here is ambiguous (“Train initial model”). Consider making it actionable, e.g., explicitly telling callers to call train() before finetune(), and ideally include the class/method name in the message for easier debugging.

Suggested change
raise RuntimeError("Train initial model")
raise RuntimeError(
"ModelXgboost.finetune() requires an initial model. Call "
"ModelXgboost.train() before finetune()."
)

Copilot uses AI. Check for mistakes.

dtrain = xgb.DMatrix(X, label=y)
self.model = xgb.train(
params=params,
dtrain=dtrain,
num_boost_round=num_boost_round,
xgb_model=self.model,
**kwargs,
)
return self.model

def predict(self, X: pd.DataFrame, **kwargs: Any) -> pd.Series:
if self.model is None:
raise RuntimeError("Train a model first")
Comment on lines +51 to +52
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this error message would be more actionable if it told callers exactly what to do (e.g., call train() before predict()) and/or included context (model name/method).

Copilot uses AI. Check for mistakes.

dtest = xgb.DMatrix(X)

preds = self.model.predict(dtest, **kwargs)
return pd.Series(preds, index=X.index, name="prediction")
Empty file added tests/models/__init__.py
Empty file.
108 changes: 108 additions & 0 deletions tests/models/test_models_xgboost.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
import json
from pathlib import Path
from typing import Any

import numpy as np
import pandas as pd
import pytest

from alphapulse.models.model_xgboost import ModelXgboost

ROOT = Path(__file__).parent.parent.parent
TRAIN_DATA_PATH = ROOT / "data" / "v5.2" / "train.parquet"
FEATURES_JSON_PATH = ROOT / "data" / "v5.2" / "features.json"
TEST_DATA_PATH = ROOT / "data" / "v5.2" / "live.parquet"


@pytest.fixture
def test_data() -> tuple[pd.DataFrame, list[str]]:
"""Load Numerai data"""
with open(FEATURES_JSON_PATH, encoding="utf-8") as f:
feature_metadata = json.load(f)
feature_cols = feature_metadata["feature_sets"]["small"]
target_cols = feature_metadata["targets"]
train = pd.read_parquet(
TRAIN_DATA_PATH, columns=["era"] + feature_cols + target_cols
)
return train, feature_cols
Comment on lines +11 to +27
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests depend on local Numerai dataset files under data/v5.2/*, but the repository doesn’t include a data/ directory. As written, the test suite will fail on a clean checkout/CI. Consider replacing this with a small synthetic DataFrame fixture (or add minimal test fixtures to tests/ and load them from there).

Copilot uses AI. Check for mistakes.


@pytest.fixture
def xgb_params() -> dict[str, Any]:
return {
"learning_rate": 0.1,
"max_depth": 6,
"min_child_weight": 1,
"gamma": 0,
"subsample": 0.8,
"colsample_bytree": 0.8,
"lambda": 1,
"alpha": 0,
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xgb_params does not set an objective, but the test asserts predictions are in [0, 1]. With XGBoost defaults (regression), predictions are not guaranteed to be bounded, so this assertion can be flaky/incorrect. Either set an objective that guarantees bounds (e.g., logistic) or relax the assertion to properties that always hold (shape, finite values, etc.).

Suggested change
"alpha": 0,
"alpha": 0,
"objective": "binary:logistic",

Copilot uses AI. Check for mistakes.
}


def test_train_creates_model(
test_data: tuple[pd.DataFrame, list[str]], xgb_params: dict[str, Any]
) -> None:
"""Checks if model was created"""
train, feature_cols = test_data

model = ModelXgboost()
booster = model.train(
train[feature_cols],
train["target"],
params=xgb_params,
num_boost_round=10,
)

assert booster is not None
assert model.model is booster


def test_finetune_updates_model(
test_data: tuple[pd.DataFrame, list[str]], xgb_params: dict[str, Any]
) -> None:
"""Check if finetuning actually changes the model"""
train, feature_cols = test_data

model = ModelXgboost()

booster_before = model.train(
train[feature_cols],
train["target"],
params=xgb_params,
num_boost_round=5,
)

booster_after = model.finetune(
train[feature_cols],
train["target"],
params=xgb_params,
num_boost_round=5,
)

assert booster_after is not None
assert booster_after is not booster_before


def test_predict_output_shape_and_range(
test_data: tuple[pd.DataFrame, list[str]], xgb_params: dict[str, Any]
) -> None:
"""Checks if the number of predictions is equal to the number of test samples
and if each prediction is in [0,1]
"""
train, feature_cols = test_data
test = pd.read_parquet(TEST_DATA_PATH, columns=feature_cols)
model = ModelXgboost()
model.train(
train[feature_cols],
train["target"],
params=xgb_params,
num_boost_round=10,
)

preds = model.predict(test)

assert preds.shape[0] == test.shape[0]
assert np.all(preds >= 0.0)
assert np.all(preds <= 1.0)
37 changes: 36 additions & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading