Highest Voted Questions - Data Science Stack Exchange

9

votes

1 answer

When does decision tree perform better than the neural network?

I was experimenting with different modelling methods including KNN, Decision Trees, Neural Networks and SVN and trying to fit my data to see which works the best. To my surprise, the decision tree works the best with training accuracy of 1.0 and…

machine-learning neural-network decision-trees

asked Sep 16 '18 at 11:29

Suhail Gupta

611
8
15

9

votes

2 answers

Display Images (url) Inside Pandas Dataframe

I would like to display images (mostly jpg and png formats) directly from their url link inside a pandas dataframe. Imagine I already have the following dataframe: id image_url 1 …

python pandas visualization

asked Sep 11 '18 at 06:38

TwinPenguins

4,429
3
22
54

9

votes

2 answers

Is there any consensus on choosing an appropriate ML approach?

I am studying data science at the moment and we are taught a dizzying variety of basic regression/classification techniques (linear, logistic, trees, splines, ANN, SVM, MARS, and so on....), along with a variety of extra tools (bootstrapping,…

machine-learning neural-network deep-learning classification data-science-model

asked Sep 09 '18 at 06:23

Brendan Hill

155
8

9

votes

3 answers

R random forest on Amazon ec2 Error: cannot allocate vector of size 5.4 Gb

I am training random forest models in R using randomForest() with 1000 trees and data frames with about 20 predictors and 600K rows. On my laptop everything works fine, but when I move to amazon ec2, to run the same thing, I get the error: Error:…

r random-forest

asked Dec 19 '14 at 16:02

SOUser

9

votes

2 answers

Dealing with feature vectors of variable length

How does one deal with a feature vector that can vary in size? Let's say per object, I calculate 4 features. In order to solve a certain regression problem, I may have 1, 2, or more of these objects (no more than 10). Thus, the feature vector is…

machine-learning regression linear-regression

asked Aug 21 '18 at 20:59

Otto Nahmee

91
1
4

9

votes

3 answers

Interactive Graphing while logging data

I'm looking to graph and interactively explore live/continuously measured data. There are quite a few options out there, with plot.ly being the most user-friendly. Plot.ly has a fantastic and easy to use UI (easily scalable, pannable, easily…

dataset visualization

asked Dec 17 '14 at 21:17

Clayton Pipkin

93
3

9

votes

3 answers

Text classification with thousands of output classes in Keras

Task: I have a dataset with job titles and descriptions. The task is to predict tags for job by job title and description. There are several tags for each job posting. Therefore, the number of labels for the model will be measured in tens of…

machine-learning neural-network deep-learning keras nlp

asked Aug 20 '18 at 08:26

lemon

205
2
6

9

votes

3 answers

How to use Cross Entropy loss in pytorch for binary prediction?

In the pytorch docs, it says for cross entropy loss: input has to be a Tensor of size (minibatch, C) Does this mean that for binary (0,1) prediction, the input must be converted into an (N,2) tensor where the second dimension is equal to (1-p)? So…

deep-learning

asked Aug 18 '18 at 00:56

AAC

509
2
6
13

9

votes

2 answers

Difference between using RMSE and nDCG to evaluate Recommender Systems

What kind of error measures do RMSE and nDCG give while evaluating a recommender system, and how do I know when to use one over the other? If you could give an example of when to use each, that would be great as well!

machine-learning recommender-system model-evaluations

asked Jun 14 '14 at 18:53

covfefe

293
4
7

9

votes

2 answers

How does a FC layer work in a typical CNN

I am new to CNNs and NNs. I am reading this blog: CNN and I am confused about this part: What confuses me is the operation that will be performed on an input vector/matrix. Will we be using a typical ANN equation: "O = W.T * input"?. And then a…

neural-network

asked Aug 11 '18 at 11:05

user57521

91
1
1
2

9

votes

1 answer

Using class weights in Keras with multiple binary outputs which are not simply one-hot-encoded

My labels are binary vectors of length 5, e.g., [0, 0, 1, 1, 1]. My label set is very biased, 1-to-50, where the case [0, 0, 0, 0, 0] is very common while all other combinations are not. I'd like to weight the uncommon versions using the…

classification keras class-imbalance

asked Aug 03 '18 at 19:55

André Christoffer Andersen

336
3
9

9

votes

2 answers

How to learn 3D orientations reliably?

I am working on neural network models for 3D skeletal character animation, where I learn joint positions and orientations. The problem comes with the orientations. There are several ways I can choose to represent a 3D rotation, but all of them have…

machine-learning neural-network

asked Aug 02 '18 at 13:14

javidcf

306
1
8

9

votes

5 answers

Measuring the uncertainty of predictions

Given a multiclass classification model, with n features, how can I measure the uncertainty of the model for that particular classification? Let's say that for some class the model accuracy is amazing, but for another it's not. I would like to find…

classification

asked Jul 22 '18 at 12:36

Latent

334
3
16

9

votes

4 answers

How to download a Jupyter Notebook from GitHub?

This is a fairly basic question. I am working on a data science project inside of a Pandas tutorial. I can access my Jupyter notebooks through my Anaconda installation. The only problem is that the tutorial notebooks (exercise files) are on…

python pandas jupyter ipython

asked Jul 16 '18 at 19:59

Ethan

1,657
9
25
39

9

votes

2 answers

Data Science as a Social Scientist?

as I am very interested in programming and statistics, Data Science seems like a great career path to me - I like both fields and would like to combine them. Unfortunately, I have studied political science with a non-statistical sounding Master. I…

statistics

asked Jun 13 '14 at 07:28

Christian Sauer

657
4
7

Most Popular