Most Popular
1500 questions
10
votes
4 answers
Is this a good practice of feature engineering?
I have a practical question about feature engineering... say I want to predict house prices by using logistic regression and used a bunch of features including zip code. Then by checking the feature importance, I realize zip is a pretty good…
user3768495
- 987
- 1
- 7
- 8
10
votes
2 answers
NN embedding layer
Several neural network libraries such as tensorflow and pytorch offer an Embedding layer. Having implemented word2vec in the past, I understand the reasoning behind wanting a lower dimensional representation.
However, it would seem the embedding…
cbake
- 101
- 1
- 3
10
votes
2 answers
How to train the same RNN over multiple series?
I have multiple separate time series and would like to train the same LSTM network on them. How to do in this situation? I can't just concatenate timeseries (along time), because I am afraid network will be confused by jumps at the points of…
Dims
- 201
- 2
- 5
10
votes
5 answers
In which epoch should i stop the training to avoid overfitting
I'm working on an age estimation project trying to classify a given face in a predefined age range. For that purpose I'm training a deep NN using the keras library. The accuracy for the training and the validation sets is shown in the graph…
Yiannis Ath
- 188
- 1
- 1
- 10
10
votes
1 answer
Assumptions of linear regression
In simple terms, what are the assumptions of Linear Regression?
I just want to know that when I can apply a linear regression model to our dataset.
Anvay Joshi
- 119
- 4
10
votes
1 answer
How to draw convolutional neural network diagrams?
I have to draw a CNN diagram similar to this:
I tried all the tools mentioned in https://datascience.stackexchange.com/a/14900, but there is no easy way to do it.
Is there any automated way to do it? Or do I have to do it manually.
In addition, is…
Beginner
- 209
- 1
- 2
- 5
10
votes
2 answers
Debugging Neural Networks
I've built an artificial neural network in python using the scipy.optimize.minimize (Conjugate gradient) optimization function.
I've implemented gradient checking, double checked everything etc and I'm pretty certain it's working correctly.
I've run…
user3726050
- 109
- 4
10
votes
3 answers
AUC-ROC of a random classifier
Why the area under the ROC Curve for a random classifier is equal to 0.5 and has diagonal shape?
For me a random classifier would have 25% of TP,TN,FP,FN and therefore it would only be a single point on the ROC Curve.
Victor
- 281
- 1
- 3
- 5
10
votes
4 answers
How to estimate the variance of regressors in scikit-learn?
Every classifier in scikit-learn has a method predict_proba(x) that predicts class probabilities for x. How to do the same thing for regressors?
The only regressor for which I know how to estimate the variance of the predictions is Gaussian process…
Vladislav Gladkikh
- 1,206
- 11
- 21
10
votes
3 answers
Why do we use gradients instead of residuals in Gradient Boosting?
I have found mentions of two advantages in using gradients instead of actual residuals:
1) Using gradients will allow us to plug in any loss function (not just mse) without having to change our base learners to make them compatible with the loss…
eyio
- 101
- 1
- 3
10
votes
2 answers
Multicollinearity in Decision Tree
Can anybody please explain the affect of multicollinearity on Decision Tree algorithms (Classification and regression).
I have done some searching but was not able to find the right answer as some say it affects it and others say it doesn't.
deepguy
- 1,471
- 8
- 21
- 39
10
votes
2 answers
Validation showing huge fluctuations. What could be the cause?
I'm training a CNN for a 3-class image classification problem. My training loss decreased smoothly, which is the expected behaviour. However, my validation loss shows a lot of fluctuation.
Is this something that I should be worried about, or should…
Josh
- 497
- 4
- 8
10
votes
1 answer
Clustering customer data stored in ElasticSearch
I have a bunch of customer profiles stored in a elasticsearch cluster. These profiles are now used for creation of target groups for our email subscriptions.
Target groups are now formed manually using elasticsearch faceted search capabilities…
Konstantin V. Salikhov
- 634
- 7
- 18
10
votes
2 answers
How does the bounding box regressor work in Fast R-CNN?
In the fast R-CNN paper (https://arxiv.org/abs/1504.08083) by Ross Girshick, the bounding box parameters are continuous variables. These values are predicted using regression method. Unlike other neural network outputs, these values do not represent…
Saptarshi Roy
- 439
- 2
- 4
- 11
10
votes
2 answers
How to plot learning curve and validation curve while using pipeline
I would appreciate if you could let me know in the following example code:
from collections import Counter
from sklearn.datasets import make_classification
from sklearn.model_selection import…
ebrahimi
- 1,305
- 7
- 20
- 40