Questions about the organization of elements of data, and the standardization of their relations.
Questions tagged [data-science-model]
660 questions
31
votes
1 answer
Should one hot vectors be scaled with numerical attributes
In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through…
Suresh Kasipandy
- 588
- 1
- 4
- 8
18
votes
3 answers
Feature Scaling both training and test data
It is stated that for:
Feature Normalization -
The test set must use identical scaling to the training set.
And the point is given that:
Do not scale the training and test sets using different scalars: this
could lead to random skew in the…
aspiring1
- 377
- 1
- 2
- 13
16
votes
5 answers
When to remove correlated variables
Can somebody please suggest what is the correct stage to remove correlated variables before feature engineering or after feature engineering ?
bp89
- 163
- 1
- 1
- 5
10
votes
3 answers
Should you use random state or random seed in machine learning models?
I'm starting to study machine learning. All the examples I saw, the person that created the ML model used a random state or a random seed to stop the randomness of the process. But, in real life, when you're trying to apply a machine learning model…
Caldass_
- 187
- 1
- 1
- 9
9
votes
2 answers
image_dataset_from_directory VS flow_from_directory
What is the main diffrence between flow_from_directory VS image_dataset_from_directory in keras?
which one should I use?
Bala venkatesh
- 391
- 1
- 3
- 12
9
votes
2 answers
How to Use Shap Kernal Explainer with Pipeline models?
I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model.
My model is given below:
pipeline = Pipeline(steps= [
('imputer', imputer_function()),
('classifier', RandomForestClassifier()
…
Nayana Madhu
- 436
- 1
- 3
- 8
9
votes
2 answers
Is there any consensus on choosing an appropriate ML approach?
I am studying data science at the moment and we are taught a dizzying variety of basic regression/classification techniques (linear, logistic, trees, splines, ANN, SVM, MARS, and so on....), along with a variety of extra tools (bootstrapping,…
Brendan Hill
- 155
- 8
8
votes
1 answer
What is the difference between Trax and Tensorflow?
What is the main difference between Trax and Tensorflow?
Both of them are deep learning libraries and implemented by Google.
https://github.com/google/trax
https://github.com/tensorflow/tensorflow
Bala venkatesh
- 391
- 1
- 3
- 12
8
votes
1 answer
How to Combat Data Drift
I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
scott lucas
- 83
- 3
7
votes
2 answers
Correct interpretation of summary_plot shap graph
While through the various resources online to understand the shap plots, I ended up slightly confused. Find below my interpretation of the overall plot given in examples -
Shap value 0 for a feature corresponds to the average prediction using all…
Sanchez_P
- 101
- 1
- 1
- 5
7
votes
2 answers
How does real world machine learning production systems run?
Dear Machine Learning/AI Community,
I am just a budding and aspiring Machine Learner who has worked on open online data sets and some POC's built locally for my project. I have built some models and converted into pickle objects in order to avoid…
Manikant Kella
- 73
- 4
7
votes
5 answers
How to handle missing value if imputation doesnt make sense
I have column/feature in my dataset showing years a person has been married "years_married".
Since not every person is married there are NaN fields.
It does not make sense to fillna(0) "years_married" since 0 would mean the person just married.A…
methus
- 131
- 6
7
votes
2 answers
Do model training pipeline should run on dev, staging and production environment?
I know it's a best practice to ship our code from dev to staging to production by including different level tests and validations that will help to confidently deploy on the production environment.
But, for the ML models, data scientists will first…
shaik moeed
- 171
- 1
- 7
6
votes
2 answers
How do I decide if I need to go for Normalization and not Standardization or vice-versa?
While designing a ML model, how do I decide if I need to go for Normalization and not Standardization or vice-versa? On what factor is this decision made?
Ajith Madhav
- 63
- 3
6
votes
1 answer
Differences between big data, data warehousing, business intelligence and data science?
I know they are four different areas, but I would like to know what are the main differences between those disciplines, and how they are related to each other if some of them depend on each other, and what is the specific objective of each one.
Jorge Herrera
- 61
- 2