Highest Voted 'one-hot-encoding' Questions - Data Science Stack Exchange

11

votes

3 answers

What is the difference between one-hot and dummy encoding?

I am trying to understand The reason behind encoding (one-hot encoding and dummy encoding) How one-hot and dummy are different from each other

encoding one-hot-encoding

asked Jul 22 '21 at 06:53

user121028

8

votes

5 answers

How do I encode the categorical columns if there are more than 15 unique values?

I'm trying to use this data to make a data analysis report using regression. Since regression only allows for numerical types, I then need to encode the categorical data. However, most of these have more than 15 unique values such as country. Do I…

regression categorical-data encoding one-hot-encoding categorical-encoding

asked Dec 24 '20 at 20:11

Cinemato

81
1
2

7

votes

2 answers

Possible harm in standardizing one-hot encoded features

While there may not be any added value in standardizing one-hot encoded features prior to applying linear models, is there is any harm in doing so (i.e., affecting model performance)? Standardizing definition: applying (x - mean) / std to make the…

linear-regression pipelines one-hot-encoding collinearity

asked Aug 13 '20 at 14:27

thereandhere1

775
1
12
25

6

votes

1 answer

On gradient boosting and types of encodings

I am having a look at this material and I have found the following statement: For this class of models [Gradient Boosting Machine algorithms] [...] it is both safe and significantly more computationally efficient use an arbitrary integer encoding…

encoding boosting categorical-encoding one-hot-encoding

asked Jul 21 '20 at 16:22

carlo_sguera

161
3

6

votes

2 answers

How to handle categorical variables with Random Forest using Scikit Learn?

One of the variables/features is the department id, which is like 1001, 1002, ..., 1218, etc. The ids are nominal, not ordinal, i.e., they are just ids, department 1002 is by no means higher than department 1001. I feed the feature to random forest…

random-forest one-hot-encoding categorical-encoding

asked Mar 14 '22 at 18:46

Fred Chang

95
1
2
6

5

votes

1 answer

Should I do one hot encoding before feature selection and how should I perform feature selection on a dataset with both categorical and numerical data

a newbie here. I am currently self-learning data science. I am working on a dataset that has both categorical and numerical (continuous and discrete) features (26 columns, 30244 rows). Target is numerical (1, 2, 3). I have several questions. I…

feature-selection encoding one-hot-encoding

asked May 30 '21 at 18:08

leahnanno

83
1
4

5

votes

1 answer

When to One-Hot encode categorical data when following Crisp-DM

I have a dataset that contains 15 categorical features (2 and 3 level factors which are non-ordinal) and 3 continuous numeric features. Seeing as most machine learning algorithms require numerical data as input features, and actually automatically…

data-mining preprocessing one-hot-encoding

asked Jul 31 '20 at 06:00

kjtheron

153
4

5

votes

1 answer

One Hot Encoding for any kind of dataset

How can I make a one hot encoding for a unknown dataset which can iterate and check the dytype of the dataset and do one hot encoding by checking the number of unique values of the columns, also how to keep track of the new one hot encoded data with…

data python-3.x one-hot-encoding

asked Jul 10 '20 at 01:58

Devansh Mishra

63
4

5

votes

1 answer

Difference between tf.keras.backend.one_hot and keras.utils.to_categorical

I'm working on a classification project and need to do one hot encoding on my data set. I'm just wondering what is the difference between tf.keras.backend.one_hot and keras.utils.to_categorical, and is one of them preferred over the other?

python keras tensorflow one-hot-encoding

asked Mar 15 '20 at 12:28

kimchilover123

51
2

5

votes

3 answers

Autoencoder general questions and poor loss

I'm trying to get a simple autoencoder working on the iris dataset to explore autoencoders at a basic level. However, I'm running into an issue where the model's loss is extremely high (>20). Can someone help me understand if this model looks normal…

python machine-learning-model autoencoder one-hot-encoding

asked Jun 25 '22 at 19:32

user37649

51
1

3

votes

2 answers

How to handle non ordinal Features like Gender,Language,Region etc? Ordinal Encoding or one-hot encoding?

I see that usually, while preparing the dataset. Usually, data scientists convert non-ordinal features like Gender or Language in a dataset using LabelEncoder/ordinalEncoder. Ideally, they should have done One-hot encoding right? Won't introducing…

data feature-engineering preprocessing encoding one-hot-encoding

asked May 31 '21 at 17:23

Nitin Shravan

31
4

3

votes

1 answer

One hot encoding of target variable containing classes 1 to 9 not including zero

While predicting a solution for a sudoku puzzle using CNN, the target variable should predict values from 1 to 9 for all the 81(9*9) values in the puzzle. Hence the target value shape is (81,9). Using keras.to_categorical to convert target variable…

neural-network keras cnn one-hot-encoding categorical-encoding

asked Dec 30 '20 at 18:55

Sathish Kumar SG

31
2

3

votes

2 answers

Treating missing data in categorical features

I have a dataset with one of the categorical columns having a considerable number of missing values. The interesting thing about this column is that it has values only for a particular category in "another" column . For eg : column 1 …

machine-learning feature-selection data-cleaning feature-engineering one-hot-encoding

asked Aug 21 '20 at 08:35

Bharathi

277
8
16

3

votes

1 answer

Encoding and cross-validation

Recently I've been thinking about the proper use of encoding within cross-validation scheme. The customarily advised way of encoding features is: Split the data into train and test (hold-out) set Fit the encoder (either LabelEncoder or…

cross-validation one-hot-encoding categorical-encoding

asked Aug 17 '20 at 16:28

jakes

95
13

3

votes

1 answer

Dropping one category for regularized linear models

While reviewing the sklearn's OneHotEncoder documentation (attached below) I noticed that when applying regularization (e.g., lasso, ridge, etc.) it is not recommended to drop the first category. While I understand why dropped the first category…

scikit-learn regression linear-regression regularization one-hot-encoding

asked Aug 14 '20 at 16:06

thereandhere1

775
1
12
25

Questions tagged [one-hot-encoding]