Highest Voted Questions - Data Science Stack Exchange

10

votes

1 answer

How do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?

I am trying to train an artificial neural network with two convolutional layers (c1, c2) and two hidden layers (c1, c2). I am using the standard backpropagation approach. In the backward pass I calculate the error term of a layer (delta) based on…

machine-learning data-mining neural-network deep-learning

asked Jun 02 '15 at 20:16

cdwoelk

181
2
7

10

votes

3 answers

Network analysis classic datasets

There are several classic datasets for machine learning classification/regression tasks. The most popular are: Iris Flower Data Set; Titanic Data Set; Motor Trend Cars; etc. But does anyone know similar datasets for networks analysis / graph…

dataset graphs

asked Jun 26 '14 at 13:32

sobach

1,139
5
20

10

votes

5 answers

How to create a good list of stopwords

I am looking for some hints on how to curate a list of stopwords. Does someone know / can someone recommend a good method to extract stopword lists from the dataset itself for preprocessing and filtering? The Data: a huge amount of human text input…

data-mining nlp information-retrieval language-model

asked May 24 '15 at 21:45

PlagTag

333
1
3
10

10

votes

1 answer

How does the forward method get called in this pyTorch conv net?

In this example network from pyTorch tutorial import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel, 6 output channels,…

neural-network pytorch

asked Aug 30 '19 at 10:17

SheppLogan

322
4
11

10

votes

3 answers

Decision Trees - how does split for categorical features happen?

A decision tree, while performing recursive binary splitting, selects an independent variable (say $X_j$) and a threshold (say $t$) such that the predictor space is split into regions {$X|X_j < t$} and {$X|X_j >= t$}, and which leads to greatest…

machine-learning random-forest decision-trees algorithms

asked Aug 08 '19 at 17:25

Supratim Haldar

309
1
3
10

10

votes

1 answer

Spark, optimally splitting a single RDD into two

I have a large dataset that I need to split into groups according to specific parameters. I want the job to process as efficiently as possible. I can envision two ways of doing so Option 1 - Create map from original RDD and filter def…

apache-spark pyspark

asked May 01 '15 at 20:32

j.a.gartner

1,215
1
9
18

10

votes

2 answers

How does class_weight work in Decision Tree

The scikit-learn implementation of DecisionTreeClassifier has a parameter as class_weight. As per documentation: Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. and The…

scikit-learn decision-trees class-imbalance

asked Jul 23 '19 at 14:29

Supratim Haldar

309
1
3
10

10

votes

4 answers

How many parameters in a Conv2d Layer?

I was following andrew-ng coursera course on deep learning and there's a question that has been asked there which I couldn't figure out the answer for? Suppose your input is a 300 by 300 color (RGB) image, and you use a convolutional layer with 100…

machine-learning deep-learning convolution convolutional-neural-network

asked Jul 10 '19 at 06:24

Anjith

961
2
11
20

10

votes

3 answers

Isolation forest sklearn contamination param

I am working on an unsupervised anomaly detection task on time series data using an isolation forest algorithm. I am developing it in Python, more in detail using scikit-learn. I found a lot of examples on this, but what is not very clear, is how to…

python scikit-learn unsupervised-learning anomaly-detection outlier

asked Jul 01 '19 at 19:58

Giordano

345
2
4
11

10

votes

2 answers

Is it valid to shuffle time-series data for a prediction task?

I have a time-series dataset that records some participants' daily features from wearable sensors and their daily mood status. The goal is to use one day's daily features and predict the next day's mood status for participants with machine learning…

time-series predictive-modeling linear-regression

asked Jun 21 '19 at 17:48

Han

103
1
1
5

10

votes

5 answers

Why decision tree needs categorical variable to be encoded?

As per my intuition, decision trees should work better with categorical variables than with continuous variables. If this is the case, why is encoding needed on categorical variables? Can someone give me the intuition behind this?

machine-learning decision-trees

asked May 16 '19 at 11:58

Mukesh K

101
1
1
5

10

votes

3 answers

Splitting train/test sets by an identifier?

I know sklearn has train_test_split() to split a train and test set. But I read that, even with setting a random seed, if your actual dataset is updated regularly, the random seed will reset with each updated dataset and take a different train/test…

machine-learning python dataset data-cleaning randomized-algorithms

asked May 03 '19 at 22:42

Greg Rosen

323
4
11

10

votes

4 answers

SGDClassifier: Online Learning/partial_fit with a previously unknown label

My training set contains about 50k entries with which I do an initial learning. On a weekly basis, ~ 5k entries are added; but the same amount "disappears" (as it is user data which has to be deleted after some time). Therefore I use online learning…

multiclass-classification online-learning

asked Apr 02 '19 at 11:25

swalkner

111
6

10

votes

2 answers

Using Cross Validation technique for a CNN model

I am working on a CNN model. As always, I used batches with epochs to train my model. When it completed training and validation, finally I used a test set to measure the model performance and generate confusion matrix. Now I want to use Cross…

python deep-learning cnn cross-validation

asked Mar 22 '19 at 15:40

Hunar

1,197
2
11
33

10

votes

1 answer

Why Gaussian latent variable (noise) for GAN?

When I was reading about GAN, the thing I don't understand is why people often choose the input to a GAN (z) to be samples from a Gaussian? - and then are there also potential problems associated with this?

deep-learning gan gaussian

asked Mar 16 '19 at 22:27

asahi kibou

143
1
5

Most Popular