Most Popular

1500 questions
9
votes
1 answer

XGBoost: Quantifying Feature Importances

I need to quantify the importance of the features in my model. However, when I use XGBoost to do this, I get completely different results depending on whether I use the variable importance plot or the feature importances. For example, if I use…
NLR
  • 191
  • 1
  • 1
  • 2
9
votes
1 answer

Implementing simple linear regression using a neural network

I have been trying to implement simple linear regression using neural networks in Keras in hope of understanding how to work in the Keras library. Unfortunately, I am ending up with a very bad model. Here is the implementation: from pylab import…
mathisbetter
  • 207
  • 2
  • 5
9
votes
3 answers

Where can I find freely available multi-label datasets online?

I'm trying to find multi-label classfication datasets, which are available for free online. By "multi-label" I mean that each instance can be labeled with anywhere from a single to $k$ labels, where $k$ is the total number of different labels in…
Bobson Dugnutt
  • 195
  • 1
  • 8
9
votes
1 answer

GAN vs DCGAN difference

I am trying to understand the key difference between GAN and DCGAN. I know that DCGAN uses a convolutional network. But: What data is better to push into GAN and what data fits better to DCGAN? Does DCGAN work better with small data dimensions?
CezarySzulc
  • 267
  • 1
  • 3
  • 10
9
votes
1 answer

Binary classification of every time series step based on past and future values

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed. I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the…
Chris
  • 245
  • 2
  • 9
9
votes
1 answer

Implementation of Stochastic Gradient Descent in Python

I am attempting to implement a basic Stochastic Gradient Descent algorithm for a 2-d linear regression in Python. I was given some boilerplate code for vanilla GD, and I have attempted to convert it to work for SGD. Specifically -- I am a little…
foobarbaz
  • 203
  • 1
  • 2
  • 4
9
votes
3 answers

Binary (Unary) Recommendation System with Biased Views

I would like to create a content recommendation system based on binary click data that also takes views into account. What content a user has been exposed to, and therefore has the chance to click on, is currently biased by a rule based system that…
elz
  • 43
  • 8
9
votes
1 answer

How can I change the transparency of a histogram plot in Seaborn using Pairgrid?

I'm using the Kaggle Titanic dataset. One feature is "Embarked", the city the passenger embarked from. The survival rate appears to correlate with it, but I'm worried it may just be correlated with the ticket Fare (which the survival rate definitely…
GrundleMoof
  • 311
  • 2
  • 4
  • 7
9
votes
1 answer

Dropout vs weight decay

Dropout and weight decay are both regularization techniques. From my experience, dropout has been more widely used in the last few years. Are there scenarios where weight decay shines more than dropout?
9
votes
3 answers

Difference between indicator column and categorical identity column in tensorflow

I am learning Tensorflow and came across different feature columns used in Tensorflow . Out of these types, two are categorical_identity_column and indicator_column. Both have been defined in the same way. As far as I understand, both convert…
Ankit Seth
  • 1,821
  • 15
  • 27
9
votes
1 answer

Which type auto encoder gives best results for text

I did I couple of examples for auto encoders for images and they worked fine. Now I want to do an auto encoder for text that takes as input a sentence and returns the same sentence. But when I try to use the same auto encoders as the ones I used for…
sspp
  • 109
  • 2
  • 6
9
votes
4 answers

Exploratory Data Analysis with Image Datset

In Machine Learning Kernels on Kaggle I often see EDAs with structured data. So, I was wondering, if there are any recommended/standard procedures for EDA with image datasets. What kind of statistical analyses do you conduct, what kind of plots do…
9
votes
9 answers

Confusion Matrix - Get Items FP/FN/TP/TN - Python

After run my python code: print(confusion_matrix(x_test, x_pred)) I get this: [100 32 211 21] My question is how can I get the following list: True positive = 100 False positive = 32 False negative = 211 True negative = 21 Is this possible?
John_Rodgers
  • 157
  • 1
  • 1
  • 7
9
votes
2 answers

LSTM: How to deal with nonstationarity when predicting a time series

I want to do one-step-ahead predictions for time series with LSTM. To understand the algorithm, I built myself a toy example: A simple autocorrelated process. def my_process(n, p, drift=0, displacement=0): x = np.zeros(n) for i in range(1,…
Elias Schoof
  • 1,646
  • 11
  • 25
9
votes
2 answers

Always drop the first column after performing One Hot Encoding?

Since one of the columns can be generated completely from the others, and hence retaining this extra column does not add any new information for the modelling process, would it be good practice to always drop the first column after performing One…