Skip to content

feat: Add XGBoost model implementation#23

Open
tamilatiurina wants to merge 1 commit intomainfrom
feat/xgboost-logic
Open

feat: Add XGBoost model implementation#23
tamilatiurina wants to merge 1 commit intomainfrom
feat/xgboost-logic

Conversation

@tamilatiurina
Copy link
Collaborator

  1. Added needed dependencies for models to pyproject.toml
  2. Added models/model_abstract, models/model_xgboost where xgboost script is allocated that follow base model abstract class' methods
  3. Added tests/models/test_models_xgboost for testing train, finetune and predict methods of xgboost model

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an XGBoost-backed model implementation under alphapulse.models, wires up a new optional dependency extra (models), and introduces tests intended to validate train/finetune/predict behavior.

Changes:

  • Add ModelAbstract and ModelXgboost implementations under src/alphapulse/models/.
  • Add a new models optional dependency group (and lockfile updates) including xgboost.
  • Add pytest coverage for XGBoost training/finetuning/prediction.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pyproject.toml Adds models optional dependencies for model-related packages.
uv.lock Locks models extra deps, including xgboost and transitive packages.
src/alphapulse/models/model_abstract.py Introduces a base abstract model interface.
src/alphapulse/models/model_xgboost.py Adds XGBoost model wrapper implementing the abstract interface.
src/alphapulse/models/__init__.py Exposes model classes from the alphapulse.models package.
tests/models/test_models_xgboost.py Adds tests for training, finetuning, and prediction output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +27
ROOT = Path(__file__).parent.parent.parent
TRAIN_DATA_PATH = ROOT / "data" / "v5.2" / "train.parquet"
FEATURES_JSON_PATH = ROOT / "data" / "v5.2" / "features.json"
TEST_DATA_PATH = ROOT / "data" / "v5.2" / "live.parquet"


@pytest.fixture
def test_data() -> tuple[pd.DataFrame, list[str]]:
"""Load Numerai data"""
with open(FEATURES_JSON_PATH, encoding="utf-8") as f:
feature_metadata = json.load(f)
feature_cols = feature_metadata["feature_sets"]["small"]
target_cols = feature_metadata["targets"]
train = pd.read_parquet(
TRAIN_DATA_PATH, columns=["era"] + feature_cols + target_cols
)
return train, feature_cols
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests depend on local Numerai dataset files under data/v5.2/*, but the repository doesn’t include a data/ directory. As written, the test suite will fail on a clean checkout/CI. Consider replacing this with a small synthetic DataFrame fixture (or add minimal test fixtures to tests/ and load them from there).

Copilot uses AI. Check for mistakes.
"subsample": 0.8,
"colsample_bytree": 0.8,
"lambda": 1,
"alpha": 0,
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xgb_params does not set an objective, but the test asserts predictions are in [0, 1]. With XGBoost defaults (regression), predictions are not guaranteed to be bounded, so this assertion can be flaky/incorrect. Either set an objective that guarantees bounds (e.g., logistic) or relax the assertion to properties that always hold (shape, finite values, etc.).

Suggested change
"alpha": 0,
"alpha": 0,
"objective": "binary:logistic",

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +24
raise NotImplementedError("Train method needs to be overriden")

@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
"""Finetune the trained model"""
raise NotImplementedError("Finetune method needs to be overriden")

@abstractmethod
def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series:
"""Predict the result of the trained model"""
raise NotImplementedError("Predict method needs to be overriden")
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos in error text: “overriden” should be “overridden” in these NotImplementedError messages.

Suggested change
raise NotImplementedError("Train method needs to be overriden")
@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
"""Finetune the trained model"""
raise NotImplementedError("Finetune method needs to be overriden")
@abstractmethod
def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series:
"""Predict the result of the trained model"""
raise NotImplementedError("Predict method needs to be overriden")
raise NotImplementedError("Train method needs to be overridden")
@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
"""Finetune the trained model"""
raise NotImplementedError("Finetune method needs to be overridden")
@abstractmethod
def predict(self, *_args: Any, **_kwargs: Any) -> pd.Series:
"""Predict the result of the trained model"""
raise NotImplementedError("Predict method needs to be overridden")

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +69
models = [
"xgboost",
"pandas",
"numpy",
]
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard dev workflow (uv sync --extra dev / uv run mypy / uv run pytest) will not install the new models extra, but the code and tests unconditionally import xgboost. To keep CI/local dev green, either add xgboost to the dev extra (or core deps), or make the xgboost import/tests conditional on the extra being installed.

Copilot uses AI. Check for mistakes.
**kwargs: Any,
) -> xgb.Booster:
if self.model is None:
raise RuntimeError("Train initial model")
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime error message here is ambiguous (“Train initial model”). Consider making it actionable, e.g., explicitly telling callers to call train() before finetune(), and ideally include the class/method name in the message for easier debugging.

Suggested change
raise RuntimeError("Train initial model")
raise RuntimeError(
"ModelXgboost.finetune() requires an initial model. Call "
"ModelXgboost.train() before finetune()."
)

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +13
import pandas as pd
import xgboost as xgb


class ModelAbstract(ABC):
"""Abstract class for all models"""

@abstractmethod
def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
"""Initial training the model"""
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelAbstract imports xgboost and returns xgb.Booster, which makes the abstract base class (and any import of alphapulse.models) require the optional models extra. This will break uv run mypy / uv run pytest in the default dev environment where xgboost isn’t installed. Consider removing the hard dependency from the abstract layer (e.g., return Any/a protocol, or use a TYPE_CHECKING import and a forward reference) so the package can be imported/type-checked without the optional extra.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +4
from .model_abstract import ModelAbstract
from .model_xgboost import ModelXgboost

__all__ = ["ModelAbstract", "ModelXgboost"]
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package __init__ eagerly imports ModelXgboost, which will raise ModuleNotFoundError for users who install the base package without the models extra (because xgboost is optional). Consider making alphapulse.models safe to import without xgboost (lazy/conditional import, or avoid exporting ModelXgboost at package import time).

Copilot uses AI. Check for mistakes.
Comment on lines +67 to 72
"pandas",
"numpy",
]



Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new models extra includes pandas, but pandas is already a core dependency. Keeping duplicates makes dependency intent unclear; consider limiting this extra to only what’s actually optional (likely just xgboost, and possibly numpy if not already required elsewhere).

Suggested change
"pandas",
"numpy",
]
"numpy",
]

Copilot uses AI. Check for mistakes.
"""Abstract class for all models"""

@abstractmethod
def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.train matches the call.

Suggested change
def train(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
def train(
self,
*args: Any,
params: Any = None,
num_boost_round: Any = None,
**kwargs: Any,
) -> xgb.Booster:

Copilot uses AI. Check for mistakes.
raise NotImplementedError("Train method needs to be overriden")

@abstractmethod
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overridden method signature does not match call, where it is passed an argument named 'params'. Overriding method method ModelXgboost.finetune matches the call.
Overridden method signature does not match call, where it is passed an argument named 'num_boost_round'. Overriding method method ModelXgboost.finetune matches the call.

Suggested change
def finetune(self, *_args: Any, **_kwargs: Any) -> xgb.Booster:
def finetune(
self,
params: Any = None,
num_boost_round: int | None = None,
*_args: Any,
**_kwargs: Any,
) -> xgb.Booster:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants