-
-
Notifications
You must be signed in to change notification settings - Fork 14
example in the documentation is bad practice #30
Copy link
Copy link
Open
Description
The example in the documentation is bad practice as the output of the linear model is constant (underfit)
https://justcause.readthedocs.io/en/latest/
>>> from justcause.data.sets import load_ihdp
>>> from justcause.learners import SLearner
>>> from justcause.learners.propensity import estimate_propensities
>>> from justcause.metrics import pehe_score, mean_absolute
>>> from justcause.evaluation import calc_scores
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LinearRegression
>>> import pandas as pd
>>> replications = load_ihdp(select_rep=[0, 1, 2])
>>> slearner = SLearner(LinearRegression())
>>> metrics = [pehe_score, mean_absolute]
>>> scores = []
>>> for rep in replications:
>>> train, test = train_test_split(rep, train_size=0.8)
>>> p = estimate_propensities(train.np.X, train.np.t)
>>> slearner.fit(train.np.X, train.np.t, train.np.y, weights=1/p)
>>> pred_ite = slearner.predict_ite(test.np.X, test.np.t, test.np.y)
>>> scores.append(calc_scores(test.np.ite, pred_ite, metrics))
>>> pd.DataFrame(scores)
pehe_score mean_absolute
0 0.998388 0.149710
1 0.790441 0.119423
2 0.894113 0.151275
When one looks at pred_ite the standard deviation is almost zero. The predictive power of the model is practically zero.
Thus, the example should either include some relative evaluation relative to the dummy model (e.g. constant).
pred_ite.std()
1.130466570252318e-15
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels