-
Notifications
You must be signed in to change notification settings - Fork 139
Description
A number of times we have accidentally compared the magnitude of coefficients in the yaml files that represent MNLDiscreteChoiceModel instances. This is of course a mistake as 0.001 is a large coefficient for nonres_sqft and a small coefficient for frac_developed. In addition the magick 3's problem; The code puts a hard cut off for coefficients at -3 and 3. This is a grate default for normalized variables i.e. ones with std ~=1 mean ~=0 but way to small/big for other columns. If coefficients are made comparable then we can also consider adding L1 or L2 regularization.
My proposal is that when fitting a model subtract the mean and divide by the std for each column. In the yaml file store the training mean, training std, and the coefficients of the transformed columns. Then when predicting with a model we transform with the stored mean and std. Use of the Models will be unchanged, but the stored coefficients will be comparable with each other.
Thoughts?