Highest Voted 'feature-engineering' Questions

177

votes

4 answers

When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?

I have been building models with categorical data for a while now and when in this situation I basically default to using scikit-learn's LabelEncoder function to transform this data prior to building a model. I understand the difference between OHE,…

scikit-learn categorical-data feature-engineering

asked Dec 19 '15 at 19:30

anthr

1,893
3
12
11

48

votes

6 answers

Encoding features like month and hour as categorial or numeric?

Is it better to encode features like month and hour as factor or numeric in a machine learning model? On the one hand, I feel numeric encoding might be reasonable, because time is a forward progressing process (the fifth month is followed by the…

machine-learning feature-extraction feature-engineering encoding numerical

asked Mar 22 '17 at 07:43

Funkwecker

635
1
6
13

33

votes

1 answer

Ways to deal with longitude/latitude feature

I am working on a fictional dataset with 25 features. Two of the features are latitude and longitude of a place and others are pH values, elevation, windSpeed etc with varying ranges. I can perform normalization on the other features but how do I…

machine-learning python feature-engineering feature-scaling normalization

asked Aug 20 '16 at 06:51

AllThingsScience

443
1
4
5

31

votes

1 answer

Should one hot vectors be scaled with numerical attributes

In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through…

feature-engineering feature-scaling data-science-model

asked May 14 '18 at 17:54

Suresh Kasipandy

588
1
4
8

30

votes

3 answers

Why do we convert skewed data into a normal distribution

I was going through a solution of the Housing prices competition on Kaggle (Human Analog's Kernel on House Prices: Advance Regression Techniques) and came across this part: # Transform the skewed numeric features by taking log(feature + 1). # This…

regression feature-extraction feature-engineering kaggle feature-scaling

asked Jul 07 '17 at 11:35

PixelPioneer

815
2
10
10

27

votes

3 answers

Encoding categorical variables using likelihood estimation

I am trying to understand how I can encode categorical variables using likelihood estimation, but have had little success so far. Any suggestions would be greatly appreciated.

feature-engineering

asked Apr 04 '16 at 09:31

small dwarf

271
1
3
4

25

votes

4 answers

Is feature engineering still useful when using XGBoost?

I was reading the material related to XGBoost. It seems that this method does not require any variable scaling since it is based on trees and this one can capture complex non-linearity pattern, interactions. And it can handle both numerical and…

xgboost feature-engineering

asked Mar 20 '17 at 13:58

KevinKim

635
1
7
13

22

votes

3 answers

How to perform feature engineering on unknown features?

I am participating on a kaggle competition. The dataset has around 100 features and all are unknown (in terms of what actually they represent). Basically they are just numbers. People are performing a lot of feature engineering on these features. I…

machine-learning feature-selection feature-extraction feature-engineering kaggle

asked Mar 10 '16 at 19:39

user2409011

451
1
5
8

20

votes

1 answer

What is difference between one hot encoding and leave one out encoding?

I am reading a presentation and it recommends not using leave one out encoding, but it is okay with one hot encoding. I thought they both were the same. Can anyone describe what the differences between them are?

machine-learning data-mining feature-selection feature-extraction feature-engineering

asked Mar 23 '16 at 03:25

icm

539
2
5
9

18

votes

2 answers

List of feature engineering techniques

Is there any resource with a list of feature engineering techniques? A mapping of type of data, model and feature engineering technique would be a gold mine.

feature-selection feature-extraction feature-engineering feature-construction featurization

asked Jul 25 '16 at 18:55

icm

539
2
5
9

15

votes

3 answers

Why does frequency encoding work?

Frequency encoding is a widely used technique in Kaggle competitions, and many times proves to be a very reasonable way of dealing with categorical features with high cardinality. I really don't understand why it works. Does it work in very…

machine-learning feature-engineering categorical-data encoding

asked Nov 25 '19 at 15:36

David Masip

6,136
2
28
62

12

votes

1 answer

What feature engineering is necessary with tree based algorithms?

I understand data hygiene, which is probably the most basic feature engineering. That is making sure all your data is properly loaded, making sure N/As are treated as a special value rather than a number between -1 and 1, and tagging your…

feature-selection decision-trees xgboost gradient-descent feature-engineering

asked Aug 08 '17 at 15:00

William Entriken

433
1
4
10

11

votes

2 answers

Dissmissing features based on correlation with target variable

Is it valid to dismiss features based on their Pearson correlation values with the target variable in a classification problem? say for instance I have a dataset with the following format where the target variable takes 1 or 0: >>> dt.head() ID …

feature-selection correlation feature-engineering

asked Mar 12 '16 at 15:21

MedAli

275
1
2
10

10

votes

1 answer

Should I rescale tfidf features?

I have a dataset which contains both text and numeric features. I have encoded the text ones using the TfidfVectorizer from sklearn. I would now like to apply logistic regression to the resulting dataframe. My issue is that the numeric features…

nlp feature-engineering feature-scaling tfidf

asked Jun 27 '18 at 16:30

ignoring_gravity

793
4
15

10

votes

4 answers

Is this a good practice of feature engineering?

I have a practical question about feature engineering... say I want to predict house prices by using logistic regression and used a bunch of features including zip code. Then by checking the feature importance, I realize zip is a pretty good…

logistic-regression feature-engineering feature-construction

asked Jun 13 '18 at 22:07

user3768495

987
1
7
8

Questions tagged [feature-engineering]