Highest Voted 'data-science-model' Questions - Data Science Stack Exchange

31

votes

1 answer

Should one hot vectors be scaled with numerical attributes

In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through…

feature-engineering feature-scaling data-science-model

asked May 14 '18 at 17:54

Suresh Kasipandy

588
1
4
8

18

votes

3 answers

Feature Scaling both training and test data

It is stated that for: Feature Normalization - The test set must use identical scaling to the training set. And the point is given that: Do not scale the training and test sets using different scalars: this could lead to random skew in the…

machine-learning data-science-model

asked Oct 19 '18 at 17:26

aspiring1

377
1
2
13

16

votes

5 answers

When to remove correlated variables

Can somebody please suggest what is the correct stage to remove correlated variables before feature engineering or after feature engineering ?

machine-learning feature-selection data-science-model

asked Aug 03 '18 at 05:01

bp89

163
1
1
5

10

votes

3 answers

Should you use random state or random seed in machine learning models?

I'm starting to study machine learning. All the examples I saw, the person that created the ML model used a random state or a random seed to stop the randomness of the process. But, in real life, when you're trying to apply a machine learning model…

machine-learning cross-validation data-science-model

asked Jul 22 '20 at 02:43

Caldass_

187
1
1
9

9

votes

2 answers

image_dataset_from_directory VS flow_from_directory

What is the main diffrence between flow_from_directory VS image_dataset_from_directory in keras? which one should I use?

machine-learning deep-learning keras tensorflow data-science-model

asked Jul 28 '20 at 07:38

Bala venkatesh

391
1
3
12

9

votes

2 answers

How to Use Shap Kernal Explainer with Pipeline models?

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model. My model is given below: pipeline = Pipeline(steps= [ ('imputer', imputer_function()), ('classifier', RandomForestClassifier() …

machine-learning machine-learning-model data-science-model ipython

asked May 23 '19 at 14:57

Nayana Madhu

436
1
3
8

9

votes

2 answers

Is there any consensus on choosing an appropriate ML approach?

I am studying data science at the moment and we are taught a dizzying variety of basic regression/classification techniques (linear, logistic, trees, splines, ANN, SVM, MARS, and so on....), along with a variety of extra tools (bootstrapping,…

machine-learning neural-network deep-learning classification data-science-model

asked Sep 09 '18 at 06:23

Brendan Hill

155
8

8

votes

1 answer

What is the difference between Trax and Tensorflow?

What is the main difference between Trax and Tensorflow? Both of them are deep learning libraries and implemented by Google. https://github.com/google/trax https://github.com/tensorflow/tensorflow

tensorflow data-science-model

asked Jan 12 '21 at 10:09

Bala venkatesh

391
1
3
12

8

votes

1 answer

How to Combat Data Drift

I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…

classification feature-engineering data-science-model mlops data-drift

asked Sep 14 '22 at 11:16

scott lucas

83
3

7

votes

2 answers

Correct interpretation of summary_plot shap graph

While through the various resources online to understand the shap plots, I ended up slightly confused. Find below my interpretation of the overall plot given in examples - Shap value 0 for a feature corresponds to the average prediction using all…

python data-science-model lightgbm

asked Jan 03 '20 at 12:39

Sanchez_P

101
1
1
5

7

votes

2 answers

How does real world machine learning production systems run?

Dear Machine Learning/AI Community, I am just a budding and aspiring Machine Learner who has worked on open online data sets and some POC's built locally for my project. I have built some models and converted into pickle objects in order to avoid…

machine-learning python deep-learning data-science-model

asked Jun 22 '18 at 06:40

Manikant Kella

73
4

7

votes

5 answers

How to handle missing value if imputation doesnt make sense

I have column/feature in my dataset showing years a person has been married "years_married". Since not every person is married there are NaN fields. It does not make sense to fillna(0) "years_married" since 0 would mean the person just married.A…

data-science-model missing-data data-imputation

asked Mar 02 '23 at 16:41

methus

131
6

7

votes

2 answers

Do model training pipeline should run on dev, staging and production environment?

I know it's a best practice to ship our code from dev to staging to production by including different level tests and validations that will help to confidently deploy on the production environment. But, for the ML models, data scientists will first…

machine-learning python training data-science-model deployment

asked Feb 21 '23 at 06:59

shaik moeed

171
1
7

6

votes

2 answers

How do I decide if I need to go for Normalization and not Standardization or vice-versa?

While designing a ML model, how do I decide if I need to go for Normalization and not Standardization or vice-versa? On what factor is this decision made?

machine-learning python data-science-model

asked Apr 19 '19 at 09:54

Ajith Madhav

63
3

6

votes

1 answer

Differences between big data, data warehousing, business intelligence and data science?

I know they are four different areas, but I would like to know what are the main differences between those disciplines, and how they are related to each other if some of them depend on each other, and what is the specific objective of each one.

bigdata data-science-model

asked Oct 01 '18 at 17:21

Jorge Herrera

61
2

Questions tagged [data-science-model]