Given data with an ordinal variable, says "house quality" with values ex (excellent), gd (good), fa (fair) and bd (bad), we obviously cannot just throw data into sklearn's LabelEncoder as the resulting labels can be in wrong order, e.g. {bd: 3, gd: 2, fa: 1, ex:0}. Instead, we need to manually specify an order, right? However, if we do not have domain knowledge, how can we specify the order? Also a manual way is usually prone to error. Thus, I am curious if there is any encoder which can auto detect the correct order in an ordinal variable?
Asked
Active
Viewed 219 times
3
Victor Luu
- 241
- 1
- 6
1 Answers
1
Yup, it should be target encoding. By calculating the target mean for each category, you are ordering the categories based on the target, I don't think of a best way to order them. Of course, this is only valid in a supervised learning setting. If not, I can't think of a way of automatically ordering categories.
See this question and this blogpost to dig deeper in target encoding.
David Masip
- 6,136
- 2
- 28
- 62