In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through standardization/normalization, or should I scale the one hot vectors along with the numerical attributes?
Asked
Active
Viewed 2.0k times
1 Answers
26
Once converted to numerical form, models don't respond differently to columns of one-hot-encoded than they do to any other numerical data. So there is a clear precedent to normalise the {0,1} values if you are doing it for any reason to prepare other columns.
The effect of doing so will depend on the model class, and type of normalisation you apply, but I have noticed some (small) improvements when scaling to mean 0, std 1 for one-hot-encoded categorical data, when training neural networks.
It may make a difference too for model classes based on distance metrics.
Unfortunately, like most of these kind of choices, often you have to try both approaches and take the one with the best metric.
Neil Slater
- 29,388
- 5
- 82
- 101