Highest Voted Questions - Data Science Stack Exchange

10

votes

1 answer

How to determine the complexity of an English sentence?

I am working on an app to help people learn English as a second language. I have validated that sentences help in learning a language by providing extra context. I did that by conducting a small research in a classroom of 60 students. I have mined…

machine-learning classification nlp text-mining

asked Jun 03 '17 at 20:12

BuildMyVocab

103
7

10

votes

2 answers

Train object detection without annotated data/bounding boxes

From what I can see most object detection NNs (Fast(er) R-CNN, YOLO etc) are trained on data including bounding boxes indicating where in the picture the objects are localised. Are there algos that simply take the full picture + label annotations,…

neural-network training convolutional-neural-network object-recognition

asked May 31 '17 at 18:51

salient

203
1
2
6

10

votes

4 answers

Classify multivariate time series

I have a set of data composed of time series (8 points) with about 40 dimensions (so each time series is 8 by 40). The corresponding ouput (the possible outcomes for the categories ) is eitheir 0 or 1. What would be the best approach to design a…

classification time-series pca

asked May 09 '17 at 08:33

AugBar

203
1
2
8

10

votes

1 answer

How to use TFIDF vectors with multinomial naive bayes?

Say we have used the TFIDF transform to encode documents into continuous-valued features. How would we now use this as input to a Naive Bayes classifier? Bernoulli naive-bayes is out, because our features aren't binary anymore. Seems like we can't…

scikit-learn naive-bayes-classifier text

asked Apr 05 '17 at 17:10

dhrumeel

201
2
4

10

votes

1 answer

What is the difference between fasttext and DANs in document classification?

I came across two interesting papers that describe promising approaches for document classification using word embedding. 1. The fasttext algorithm Described in the paper Bag of Tricks for Efficient Text Classification here. (With further…

machine-learning neural-network word2vec similar-documents

asked Apr 01 '17 at 19:48

user1043144

201
1
3

10

votes

6 answers

What are some of the best practices for sharing data and models with colleagues?

As a data scientist who recently joined a new team, I wanted to ask the community how they share data and models among their colleagues. Currently I have to resort to storing data in some central server or location where all of us can access (which…

machine-learning predictive-modeling dataset data model-selection

asked Mar 17 '17 at 18:45

asampat3090

81
1
6

10

votes

2 answers

Kernel trick explanation

In support vector machines, I understand it would be computationally prohibitive to calculate a basis function at every point in the data set. However, it is possible to find this optimal solution due to the so-called kernel trick. Other answers to…

svm efficiency

asked Mar 12 '17 at 15:21

user1717828

245
1
3
9

10

votes

2 answers

Forecasting non-negative sparse time-series data

I have a time-series dataset (daily frequency) representing the sales of a product to a customer over time. The sales is represented as the following: $$[0, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 17, 0, 0, 0, 0, 9, 0, ...]$$ in which each…

time-series forecast

asked Mar 03 '17 at 15:23

Bernardo Aflalo

323
2
9

10

votes

2 answers

Why k-fold cross validation (CV) overfits? Or why discrepancy occurs between CV and test set?

Recently, I was working on a project and found my cross-validation error rate very low, but the testing set error rate very high. This might indicate that my model is overfitting. Why does my cross-validation not overfit while my test set…

cross-validation overfitting

asked Mar 01 '17 at 22:10

user7117436

308
4
12

10

votes

1 answer

number of parameters for convolution layers

In this highly cited paper, authors give the following discussion on the number of weight parameters. I am not very clear why it has $49C^2$ parameters. I think it should be $49C$ since each of $C$ input channels shares the same filter, which has…

machine-learning deep-learning computer-vision

asked Feb 20 '17 at 00:23

user297850

253
1
3
8

10

votes

1 answer

Is it possible to train a neural network to solve polynomial equations?

I randomly generate millions groups of triplet $\lbrace x_0, x_1, x_2 \rbrace$ within range $(0,1)$, then calculate the corresponding coefficients of the polynomial $(x-x_0)(x-x_1)(x-x_2)$, which result in triplet groups normalized in a form of…

machine-learning neural-network data-mining

asked Feb 09 '17 at 16:01

Feng Wang

203
2
7

10

votes

2 answers

Why `max_features=n_features` does not make the Random Forest independent of number of trees?

Consider the following simple classification problem (Python, scikit-learn) import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.ensemble import…

python random-forest decision-trees

asked Feb 07 '17 at 10:12

Jorge Leitao

253
1
2
9

10

votes

1 answer

Confused about how to apply KMeans on my a dataset with features extracted

I am trying to apply a basic use of the scikitlearn KMeans Clustering package, to create different clusters that I could use to identify a certain activity. For example, in my dataset below, I have different usage events (0,...,11), and each event…

python clustering k-means unsupervised-learning

asked Feb 02 '17 at 14:27

Gary

529
2
5
12

10

votes

1 answer

Questions When Advancing from Vanilla Neural Network to Recurrent Neural Network

I've recently learned how a vanilla neural network would work, with given number of inputs, hidden nodes, and the same number of outputs as inputs. I've been looking at various posts now related to recurrent neural network, and I understand the…

machine-learning python neural-network rnn

asked Dec 25 '16 at 15:55

Daniel

181
1
11

10

votes

4 answers

How to use lists in Tensorflow?

I have a number of lists, such as [1,2,3,4], [2,3,4], [1,2], [2,3,4,6,8,10], whose lengths are obviously unsame. How can I use this as input of placeholder in Tensorflow? As I have tried, the following setting will raise…

tensorflow

asked Nov 11 '16 at 08:28

Oswin

103
1
1
7

Most Popular