Skip to content

Normalize input for Choice Models #208

@Eh2406

Description

@Eh2406

A number of times we have accidentally compared the magnitude of coefficients in the yaml files that represent MNLDiscreteChoiceModel instances. This is of course a mistake as 0.001 is a large coefficient for nonres_sqft and a small coefficient for frac_developed. In addition the magick 3's problem; The code puts a hard cut off for coefficients at -3 and 3. This is a grate default for normalized variables i.e. ones with std ~=1 mean ~=0 but way to small/big for other columns. If coefficients are made comparable then we can also consider adding L1 or L2 regularization.

My proposal is that when fitting a model subtract the mean and divide by the std for each column. In the yaml file store the training mean, training std, and the coefficients of the transformed columns. Then when predicting with a model we transform with the stored mean and std. Use of the Models will be unchanged, but the stored coefficients will be comparable with each other.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions