Some suggestions #1

antoinebaker · 2025-10-02T15:04:58Z

The blogpost is neat, easy to follow I think. Here a few suggestions in the python file.

For the overfitted model, I feel the narrative is "feature importance computed on a overfitted model is unreliable, however it's good enough to identify irrelevant features and trim them down to get a good model".

Is that supported by theory or in practice that RFECV with PFI is good for feature selection ?

GaetandeCast

For the overfitted model, I feel the narrative is "feature importance computed on a overfitted model is unreliable, however it's good enough to identify irrelevant features and trim them down to get a good model".

Yes, I will try to make that clearer.

Is that supported by theory or in practice that RFECV with PFI is good for feature selection ?

The minimal axiom of Reyero Lobo et al. supports it in the sense that it makes sense to eliminate the features with zero permutation importance one by one. This does not cover the fact the RFECV can remove features with non zero importance so long as they don't degrade the performance. However, in practice it is fine since it should lead to a better model because of CV.

I'll mention this in the part that justifies RFECV + permutation importance

GaetandeCast · 2025-10-02T15:56:51Z

summary_examples/blogpost1.py


 linear_regressor = LassoCV(random_state=rng)
 linear_regressor.fit(X_train, y_train)
+# maybe a dataframe feature | coef will be better looking ?


Yes, this looks nicer for instance:

print("Coefficients of the linear model:") print( pd.DataFrame( {f"x{idx}": f"{linear_regressor.coef_[idx]:.3f}" for idx in range(X.shape[1])}, index=["Coefficient"], ) )

And for the second model:

print("Coefficients of the linear model:") print( pd.DataFrame( { f"{feature_names[idx]}": f"{linear_regressor.coef_[idx]:.3f}" for idx in np.argsort(linear_regressor.coef_)[::-1] }, index=["Coefficient"], ) )

Oh I was thinking something even simpler like:

pd.options.display.precision = 3 pd.DataFrame({"feature":feature_names, "coef":linear_regressor.coef_})

If you want ordering by coef:

pd.DataFrame({"feature":feature_names, "coef":linear_regressor.coef_}).sort_values(by="coef")

GaetandeCast · 2025-10-02T16:05:46Z

summary_examples/blogpost1.py

-# if $X_2$ has a low impact on the target or if the model is overfitting on it.
+# interaction with $X_0$. We can now say that $X_1$ is important for the underlying process. Some features involving
+# $X_2$ are receiving low but non zero coefficients in the second model. In our synthetic case, we know
+# that the target $Y = X_0 + (X_0+X_1)^2 + \text{noise}$ does not depend on $X_2$, so these small nonzero coefficients


I like this clarification.
I use + \mathcal{N}(0, \sigma^2) later for the noise so we should pick one to be consistent.
I think + \text{noise} might be better since we don't need to introduce \sigma in this case (which I did not do).

GaetandeCast · 2025-10-02T16:18:00Z

summary_examples/blogpost1.py

 # Validation (`RFECV`) provides a good way to trim down irrelevant features.
 # [Justify that permutation importance is sensible by citing Reyero Lobo et al. ?]
+# [Yes ! Maybe explain that j irrelevant means X_j \perp Y | X_{-j}, and that PFI
+# (in the optimal setting) is able to detect such irrelevant features]


Ok, I'll think of something.

GaetandeCast · 2025-10-02T16:18:41Z

summary_examples/blogpost1.py

 )

 # %% [markdown]
+# [I feel the summary/recap is a bit dry, maybe give more details?]


Yeah I'll improve it, it's kind of a placeholder for now

ogrisel · 2025-10-03T14:21:40Z

Here is a pass of feedback:

Mis-specified => Misspecified
https://en.wiktionary.org/wiki/misspecified
"Misspecified model" as a section header => "A first misspecified model" or "Misspecified models" or "Dealing with misspecification"
the empty model => the null model / constant predictor
rng.normal takes sigma as second argument instead of sigma**2
Please split long code cells into subcells with independent outputs. For instance for the first cell: one cell about the data generating process, then one cell to fit the lasso model and display the non-zero coef and one cell to list the features names with 0 coef.
Instead printing a list comprehension, do a for loop with one print statement per iteration.
Make the feature names consistent in the print statements: "Feature 0" vs "x1", "x2"
Same comment w.r.t the second cell: split it into separate cells, each with their output and interleave the analysis as you go instead of writing the analysis of the second cell before the code.
You should state that we can interpret the magnitude of the coefficient of the linear model as a relative importance measure because all the features have the scale scale. Actually this is not true anymore once you use PolynomialFeatures: the cross-features for not have the same variance. So we might want to either insert a standard scaler after the PF step, or, multiply the coef values by the standard deviations of the features to get importance values.
Please join the feature names and (signed) feature importances (the scaled coefs) in a pandas dataframe and use horizontal bar plots instead of printing the values.
"and does not drop when we train on half the data, we know that the model is well specified"

The fact that the score does not drop when training on half the data does not guarantee that we are well specified: instead it tells us that we are not overfitting, that is, we have trained on enough training data points. We can only believe that we are well specified if we have have chosen an expressive enough model class (and hyperparameter set) given what we know about the structure of the data generating process.
Similar comments about the second code block: it's too long and the conclusions should be interleaved into logically separated sub code cells.
[Justify that permutation importance is sensible by citing Reyero Lobo et al. ?]

Yes please do so.

"This score does not drop significantly when we re-train on only half the data, indicating that the model is close to Bayes-optimal. "

It does drop from 0.989 to 0.980 so it's a bit of a stretch to assert that the final model is "close" to Bayes optimal. Maybe we could be reach 0.999 if we doubled the training size again, or not (e.g. stay below 0.985 what ever the number of data points). We cannot say just from those values. Maybe you can try to tweak the training set size to see if we can get closer to Bayes optimal after feature selection while still being overfitting before feature selection?
Could you please merge the two bar plot (train and test PR) into one (with test in blue and train in orange for instance) so that we can compare the relative sizes of the PIs?

antoinebaker · 2025-10-03T14:58:17Z

The fact that the score does not drop when training on half the data does not guarantee that we are well specified: instead it tells us that we are not overfitting, that is, we have trained on enough training data points. We can only believe that we are well specified if we have have chosen an expressive enough model class (and hyperparameter set) given what we know about the structure of the data generating process.

Ah yes, I was also a bit confused by this "half training" argument :) I think in your case it is self evident from the data generating process that the first linear model is misspecified and the second is well-specified, so I would just remove that part.

If you want to claim that you are "close" to the Bayes optimal model, maybe you can compare the mse to the noise variance (mse >= noise variance and equality for the Bayes optimal regressor).

suggestions

1beebd5

GaetandeCast requested changes Oct 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some suggestions #1

Some suggestions #1

Uh oh!

antoinebaker commented Oct 2, 2025 •

edited

Loading

Uh oh!

GaetandeCast left a comment

Uh oh!

GaetandeCast Oct 2, 2025

Uh oh!

antoinebaker Oct 3, 2025

Uh oh!

antoinebaker Oct 3, 2025

Uh oh!

GaetandeCast Oct 2, 2025

Uh oh!

GaetandeCast Oct 2, 2025

Uh oh!

GaetandeCast Oct 2, 2025

Uh oh!

ogrisel commented Oct 3, 2025

Uh oh!

antoinebaker commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Some suggestions #1

Are you sure you want to change the base?

Some suggestions #1

Uh oh!

Conversation

antoinebaker commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaetandeCast left a comment

Choose a reason for hiding this comment

Uh oh!

GaetandeCast Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

antoinebaker Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

antoinebaker Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

GaetandeCast Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

GaetandeCast Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

GaetandeCast Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Oct 3, 2025

Uh oh!

antoinebaker commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antoinebaker commented Oct 2, 2025 •

edited

Loading