-
Notifications
You must be signed in to change notification settings - Fork 1
Some suggestions #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
GaetandeCast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the overfitted model, I feel the narrative is "feature importance computed on a overfitted model is unreliable, however it's good enough to identify irrelevant features and trim them down to get a good model".
Yes, I will try to make that clearer.
Is that supported by theory or in practice that RFECV with PFI is good for feature selection ?
The minimal axiom of Reyero Lobo et al. supports it in the sense that it makes sense to eliminate the features with zero permutation importance one by one. This does not cover the fact the RFECV can remove features with non zero importance so long as they don't degrade the performance. However, in practice it is fine since it should lead to a better model because of CV.
I'll mention this in the part that justifies RFECV + permutation importance
|
|
||
| linear_regressor = LassoCV(random_state=rng) | ||
| linear_regressor.fit(X_train, y_train) | ||
| # maybe a dataframe feature | coef will be better looking ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks nicer for instance:
print("Coefficients of the linear model:")
print(
pd.DataFrame(
{f"x{idx}": f"{linear_regressor.coef_[idx]:.3f}" for idx in range(X.shape[1])},
index=["Coefficient"],
)
)
And for the second model:
print("Coefficients of the linear model:")
print(
pd.DataFrame(
{
f"{feature_names[idx]}": f"{linear_regressor.coef_[idx]:.3f}"
for idx in np.argsort(linear_regressor.coef_)[::-1]
},
index=["Coefficient"],
)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I was thinking something even simpler like:
pd.options.display.precision = 3
pd.DataFrame({"feature":feature_names, "coef":linear_regressor.coef_})There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want ordering by coef:
pd.DataFrame({"feature":feature_names, "coef":linear_regressor.coef_}).sort_values(by="coef")
| # if $X_2$ has a low impact on the target or if the model is overfitting on it. | ||
| # interaction with $X_0$. We can now say that $X_1$ is important for the underlying process. Some features involving | ||
| # $X_2$ are receiving low but non zero coefficients in the second model. In our synthetic case, we know | ||
| # that the target $Y = X_0 + (X_0+X_1)^2 + \text{noise}$ does not depend on $X_2$, so these small nonzero coefficients |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this clarification.
I use + \mathcal{N}(0, \sigma^2) later for the noise so we should pick one to be consistent.
I think + \text{noise} might be better since we don't need to introduce \sigma in this case (which I did not do).
| # Validation (`RFECV`) provides a good way to trim down irrelevant features. | ||
| # [Justify that permutation importance is sensible by citing Reyero Lobo et al. ?] | ||
| # [Yes ! Maybe explain that j irrelevant means X_j \perp Y | X_{-j}, and that PFI | ||
| # (in the optimal setting) is able to detect such irrelevant features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll think of something.
| ) | ||
|
|
||
| # %% [markdown] | ||
| # [I feel the summary/recap is a bit dry, maybe give more details?] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll improve it, it's kind of a placeholder for now
|
Here is a pass of feedback:
Yes please do so.
|
Ah yes, I was also a bit confused by this "half training" argument :) I think in your case it is self evident from the data generating process that the first linear model is misspecified and the second is well-specified, so I would just remove that part. If you want to claim that you are "close" to the Bayes optimal model, maybe you can compare the mse to the noise variance (mse >= noise variance and equality for the Bayes optimal regressor). |
Hi @GaetandeCast
The blogpost is neat, easy to follow I think. Here a few suggestions in the python file.
For the overfitted model, I feel the narrative is "feature importance computed on a overfitted model is unreliable, however it's good enough to identify irrelevant features and trim them down to get a good model".
Is that supported by theory or in practice that RFECV with PFI is good for feature selection ?