Highest Voted Questions - Data Science Stack Exchange

9

votes

2 answers

Is BERT a language model?

Is BERT a language model in the sense of a function that gets a sentence and returns a probability? I know its main usage is sentence embedding, but can it also provide this functionality?

nlp bert transformer language-model

asked May 13 '20 at 12:22

Amit Keinan

816
7
19

9

votes

2 answers

XGBoost and Random Forest: ntrees vs. number of boosting rounds vs. n_estimators

So I understand the main difference between Random Forests and GB Methods. Random Forests grow parallel trees and GB Methods grow one tree for each iteration. However, I am confused on the vocab used with scikit's RF regressor and xgboost's…

python random-forest decision-trees xgboost hyperparameter

asked Apr 22 '20 at 15:06

Jack Armstrong

243
2
6

9

votes

2 answers

How to normalize data without knowing the min and max values?

I have a Lending club dataset from Kaggle; it contains many different columns: there are for example dummy variables, years, amount of the loan...ect I want to normalize the data in the training and test set but I have to use the Min and Max of the…

machine-learning scikit-learn linear-regression normalization

asked Mar 25 '20 at 21:10

Ghassen Ben Hamida

93
1
4

9

votes

1 answer

Train a GAN on "before and after" images of dental surgeries

I want a GAN to train on "before and after" images of dental surgeries; so that it can generate "after" pictures for fresh patients. Input images are like these:…

tensorflow computer-vision gan vae

asked Mar 21 '20 at 05:21

Lakshay Dulani

265
1
2
6

9

votes

3 answers

Difference between Ridge and Linear Regression

From what I have understood, the Ridge Regression is just having the loss function for an optimization problem with the addition of the regularization term (L2 Norm in the case of Ridge). However I am not sure if the loss function can be described…

regression linear-regression

asked Mar 13 '20 at 19:09

Panathinaikos

307
1
2
8

9

votes

2 answers

What should be the labels for subword tokens in BERT for NER task?

For any NER task, we need a sequence of words and their corresponding labels. To extract features for these words from BERT, they need to be tokenized into subwords. For example, the word 'infrequent' (with label B-count) will be tokenized into…

bert named-entity-recognition labels

asked Mar 13 '20 at 13:32

PinkBanter

394
3
15

9

votes

1 answer

How fbprophet cross validation works

I am facing some issues to understand how cross_validation function works in fbprophet packages. I have a time series of 68 days (only business days) grouped by 15min and a certain metric : 00:00 5 00:15 2 00:30 10 etc 23:45 26 And I really…

time-series cross-validation forecasting

asked Mar 06 '20 at 14:04

Katy

93
1
1
4

9

votes

1 answer

Understanding dropout and gradient descent

I am looking at how to implement dropout on deep neural networks and found something counter intuitive. In the forward phase dropout mask activations with a random tensor of 1s and 0s to force net to learn the average of the weights. This help the…

neural-network deep-learning gradient-descent

asked Aug 27 '15 at 19:36

emanuele

415
1
4
8

9

votes

5 answers

Any idea about application of deep dream?

Recently Google publicized interesting deep dream. Besides art generation such as http://deepdreamgenerator.com/, do you see any potential applications of deep dream in computer vision or machine learning?

machine-learning deep-learning

asked Aug 12 '15 at 16:17

rudky martin

9

votes

1 answer

sklearn - overfitting problem

I'm looking for recommendations as to the best way forward for my current machine learning problem The outline of the problem and what I've done is as follows: I have 900+ trials of EEG data, where each trial is 1 second long. The ground truth is…

machine-learning python svm scikit-learn

asked Aug 11 '15 at 22:21

Simon

1,071
2
10
28

9

votes

2 answers

Can I use LSTM models to evaluate multiple, independent time series?

Let's say that I would like to predict the temperature tomorrow. I could use the approach whereby I train a model based on a time-series dataset collected from a single location (for example, see this excellent…

machine-learning keras r lstm

asked Jan 28 '20 at 21:26

CharismaticChromoFauna

111
1
6

9

votes

6 answers

is it possible (and/or logical) to set feature importance for xgboost?

If I understand tree based methods correctly, it would be better for more important features to be toward the top of the tree. Is there a way I can dictate this in xgboost? Similar to how I can assign weights to each sample/row, can I somehow…

machine-learning xgboost

asked Jan 20 '20 at 02:43

user2677285

191
1
3

9

votes

5 answers

How can we extract fields from images?

I am making an document parser which extracts data fields from the documents and store them in a structured way. Each field in my dataset is horizontal which is easy to extract. But the model fails on following type of example - Is there any way…

machine-learning python deep-learning keras object-detection

asked Jan 16 '20 at 12:35

hR 312

91
1
8

9

votes

2 answers

Why continuous features are more important than categorical features in decision tree models?

I have both categorical and continuous features in my prediction model and want to select (and rank) most important features. I have converted all categorical variables into dummy variables using one hot encoding (for better interpretation in my…

machine-learning feature-selection decision-trees logistic-regression explainable-ai

asked Jan 15 '20 at 14:55

Shahab Kazemi

103
1
4

9

votes

1 answer

Using a GAN discriminator as a standalone classifier

The goal of the discriminator in a GAN is to distinguish between real inputs and inputs synthesized by the generator. Suppose I train a GAN until the generator is good enough to fool the discriminator much of the time. Could I then use the…

neural-network classification gan

asked Jan 09 '20 at 20:34

rgov

193
3

Most Popular