Normalize input for Choice Models

A number of times we have accidentally compared the magnitude of coefficients in the yaml files that represent `MNLDiscreteChoiceModel` instances. This is of course a mistake as 0.001 is a large coefficient for `nonres_sqft` and a small coefficient for `frac_developed`. In addition the `magick 3's` problem; The code puts a hard cut off for coefficients at [-3 and 3](https://github.com/UDST/urbansim/blob/61fc8dd0f5abdd964e2cb897b2485255c89c2485/urbansim/urbanchoice/mnl.py#L175). This is a grate default for normalized variables i.e. ones with std ~=1 mean ~=0 but way to small/big for other columns. If coefficients are made comparable then we can also consider adding L1 or L2 regularization. 

My proposal is that when fitting a model subtract the mean and divide by the std for each column. In the yaml file store the training mean, training std, and the coefficients of the transformed columns. Then when predicting with a model we transform with the stored mean and std. Use of the Models will be unchanged, but the stored coefficients will be comparable with each other.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalize input for Choice Models #208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalize input for Choice Models #208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions