Most Popular

1500 questions
10
votes
1 answer

How do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?

I am trying to train an artificial neural network with two convolutional layers (c1, c2) and two hidden layers (c1, c2). I am using the standard backpropagation approach. In the backward pass I calculate the error term of a layer (delta) based on…
10
votes
3 answers

Network analysis classic datasets

There are several classic datasets for machine learning classification/regression tasks. The most popular are: Iris Flower Data Set; Titanic Data Set; Motor Trend Cars; etc. But does anyone know similar datasets for networks analysis / graph…
sobach
  • 1,139
  • 5
  • 20
10
votes
5 answers

How to create a good list of stopwords

I am looking for some hints on how to curate a list of stopwords. Does someone know / can someone recommend a good method to extract stopword lists from the dataset itself for preprocessing and filtering? The Data: a huge amount of human text input…
PlagTag
  • 333
  • 1
  • 3
  • 10
10
votes
1 answer

How does the forward method get called in this pyTorch conv net?

In this example network from pyTorch tutorial import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel, 6 output channels,…
SheppLogan
  • 322
  • 4
  • 11
10
votes
3 answers

Decision Trees - how does split for categorical features happen?

A decision tree, while performing recursive binary splitting, selects an independent variable (say $X_j$) and a threshold (say $t$) such that the predictor space is split into regions {$X|X_j < t$} and {$X|X_j >= t$}, and which leads to greatest…
10
votes
1 answer

Spark, optimally splitting a single RDD into two

I have a large dataset that I need to split into groups according to specific parameters. I want the job to process as efficiently as possible. I can envision two ways of doing so Option 1 - Create map from original RDD and filter def…
j.a.gartner
  • 1,215
  • 1
  • 9
  • 18
10
votes
2 answers

How does class_weight work in Decision Tree

The scikit-learn implementation of DecisionTreeClassifier has a parameter as class_weight. As per documentation: Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. and The…
Supratim Haldar
  • 309
  • 1
  • 3
  • 10
10
votes
4 answers

How many parameters in a Conv2d Layer?

I was following andrew-ng coursera course on deep learning and there's a question that has been asked there which I couldn't figure out the answer for? Suppose your input is a 300 by 300 color (RGB) image, and you use a convolutional layer with 100…
10
votes
3 answers

Isolation forest sklearn contamination param

I am working on an unsupervised anomaly detection task on time series data using an isolation forest algorithm. I am developing it in Python, more in detail using scikit-learn. I found a lot of examples on this, but what is not very clear, is how to…
10
votes
2 answers

Is it valid to shuffle time-series data for a prediction task?

I have a time-series dataset that records some participants' daily features from wearable sensors and their daily mood status. The goal is to use one day's daily features and predict the next day's mood status for participants with machine learning…
Han
  • 103
  • 1
  • 1
  • 5
10
votes
5 answers

Why decision tree needs categorical variable to be encoded?

As per my intuition, decision trees should work better with categorical variables than with continuous variables. If this is the case, why is encoding needed on categorical variables? Can someone give me the intuition behind this?
Mukesh K
  • 101
  • 1
  • 1
  • 5
10
votes
3 answers

Splitting train/test sets by an identifier?

I know sklearn has train_test_split() to split a train and test set. But I read that, even with setting a random seed, if your actual dataset is updated regularly, the random seed will reset with each updated dataset and take a different train/test…
10
votes
4 answers

SGDClassifier: Online Learning/partial_fit with a previously unknown label

My training set contains about 50k entries with which I do an initial learning. On a weekly basis, ~ 5k entries are added; but the same amount "disappears" (as it is user data which has to be deleted after some time). Therefore I use online learning…
swalkner
  • 111
  • 6
10
votes
2 answers

Using Cross Validation technique for a CNN model

I am working on a CNN model. As always, I used batches with epochs to train my model. When it completed training and validation, finally I used a test set to measure the model performance and generate confusion matrix. Now I want to use Cross…
Hunar
  • 1,197
  • 2
  • 11
  • 33
10
votes
1 answer

Why Gaussian latent variable (noise) for GAN?

When I was reading about GAN, the thing I don't understand is why people often choose the input to a GAN (z) to be samples from a Gaussian? - and then are there also potential problems associated with this?
asahi kibou
  • 143
  • 1
  • 5