I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?
Asked
Active
Viewed 461 times
1 Answers
1
You can calculate mean target for each categorical variable and compare its values. In pandas this can be done easily: df.groupby('categorical_feature').target.mean()
Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.
Victor Oliveira
- 820
- 4
- 10