Highest Voted Questions - Data Science Stack Exchange

10

votes

4 answers

Is this a good practice of feature engineering?

I have a practical question about feature engineering... say I want to predict house prices by using logistic regression and used a bunch of features including zip code. Then by checking the feature importance, I realize zip is a pretty good…

logistic-regression feature-engineering feature-construction

asked Jun 13 '18 at 22:07

user3768495

987
1
7
8

10

votes

2 answers

NN embedding layer

Several neural network libraries such as tensorflow and pytorch offer an Embedding layer. Having implemented word2vec in the past, I understand the reasoning behind wanting a lower dimensional representation. However, it would seem the embedding…

machine-learning deep-learning tensorflow word-embeddings vector-space-models

asked May 31 '18 at 15:30

cbake

101
1
3

10

votes

2 answers

How to train the same RNN over multiple series?

I have multiple separate time series and would like to train the same LSTM network on them. How to do in this situation? I can't just concatenate timeseries (along time), because I am afraid network will be confused by jumps at the points of…

lstm rnn

asked May 30 '18 at 09:57

Dims

201
2
5

10

votes

5 answers

In which epoch should i stop the training to avoid overfitting

I'm working on an age estimation project trying to classify a given face in a predefined age range. For that purpose I'm training a deep NN using the keras library. The accuracy for the training and the validation sets is shown in the graph…

machine-learning neural-network deep-learning accuracy overfitting

asked May 29 '18 at 09:33

Yiannis Ath

188
1
1
10

10

votes

1 answer

Assumptions of linear regression

In simple terms, what are the assumptions of Linear Regression? I just want to know that when I can apply a linear regression model to our dataset.

linear-regression

asked May 29 '18 at 04:27

Anvay Joshi

119
4

10

votes

1 answer

How to draw convolutional neural network diagrams?

I have to draw a CNN diagram similar to this: I tried all the tools mentioned in https://datascience.stackexchange.com/a/14900, but there is no easy way to do it. Is there any automated way to do it? Or do I have to do it manually. In addition, is…

machine-learning neural-network deep-learning

asked May 21 '18 at 15:55

Beginner

209
1
2
5

10

votes

2 answers

Debugging Neural Networks

I've built an artificial neural network in python using the scipy.optimize.minimize (Conjugate gradient) optimization function. I've implemented gradient checking, double checked everything etc and I'm pretty certain it's working correctly. I've run…

machine-learning python neural-network

asked Jun 11 '14 at 18:22

user3726050

109
4

10

votes

3 answers

AUC-ROC of a random classifier

Why the area under the ROC Curve for a random classifier is equal to 0.5 and has diagonal shape? For me a random classifier would have 25% of TP,TN,FP,FN and therefore it would only be a single point on the ROC Curve.

classification

asked May 20 '18 at 06:12

Victor

281
1
3
5

10

votes

4 answers

How to estimate the variance of regressors in scikit-learn?

Every classifier in scikit-learn has a method predict_proba(x) that predicts class probabilities for x. How to do the same thing for regressors? The only regressor for which I know how to estimate the variance of the predictions is Gaussian process…

python scikit-learn regression variance

asked May 17 '18 at 12:05

Vladislav Gladkikh

1,206
11
21

10

votes

3 answers

Why do we use gradients instead of residuals in Gradient Boosting?

I have found mentions of two advantages in using gradients instead of actual residuals: 1) Using gradients will allow us to plug in any loss function (not just mse) without having to change our base learners to make them compatible with the loss…

machine-learning xgboost optimization gradient-descent

asked May 13 '18 at 20:25

eyio

101
1
3

10

votes

2 answers

Multicollinearity in Decision Tree

Can anybody please explain the affect of multicollinearity on Decision Tree algorithms (Classification and regression). I have done some searching but was not able to find the right answer as some say it affects it and others say it doesn't.

decision-trees

asked May 08 '18 at 18:43

deepguy

1,471
8
21
39

10

votes

2 answers

Validation showing huge fluctuations. What could be the cause?

I'm training a CNN for a 3-class image classification problem. My training loss decreased smoothly, which is the expected behaviour. However, my validation loss shows a lot of fluctuation. Is this something that I should be worried about, or should…

image-classification cnn loss-function

asked May 02 '18 at 11:25

Josh

497
4
8

10

votes

1 answer

Clustering customer data stored in ElasticSearch

I have a bunch of customer profiles stored in a elasticsearch cluster. These profiles are now used for creation of target groups for our email subscriptions. Target groups are now formed manually using elasticsearch faceted search capabilities…

data-mining clustering

asked May 14 '14 at 08:38

Konstantin V. Salikhov

634
7
18

10

votes

2 answers

How does the bounding box regressor work in Fast R-CNN?

In the fast R-CNN paper (https://arxiv.org/abs/1504.08083) by Ross Girshick, the bounding box parameters are continuous variables. These values are predicted using regression method. Unlike other neural network outputs, these values do not represent…

deep-learning image-recognition yolo object-recognition faster-rcnn

asked Apr 20 '18 at 07:25

Saptarshi Roy

439
2
4
11

10

votes

2 answers

How to plot learning curve and validation curve while using pipeline

I would appreciate if you could let me know in the following example code: from collections import Counter from sklearn.datasets import make_classification from sklearn.model_selection import…

classification scikit-learn

asked Mar 25 '18 at 18:31

ebrahimi

1,305
7
20
40

Most Popular