Most Popular

1500 questions
9
votes
2 answers

Is BERT a language model?

Is BERT a language model in the sense of a function that gets a sentence and returns a probability? I know its main usage is sentence embedding, but can it also provide this functionality?
Amit Keinan
  • 816
  • 7
  • 19
9
votes
2 answers

XGBoost and Random Forest: ntrees vs. number of boosting rounds vs. n_estimators

So I understand the main difference between Random Forests and GB Methods. Random Forests grow parallel trees and GB Methods grow one tree for each iteration. However, I am confused on the vocab used with scikit's RF regressor and xgboost's…
9
votes
2 answers

How to normalize data without knowing the min and max values?

I have a Lending club dataset from Kaggle; it contains many different columns: there are for example dummy variables, years, amount of the loan...ect I want to normalize the data in the training and test set but I have to use the Min and Max of the…
9
votes
1 answer

Train a GAN on "before and after" images of dental surgeries

I want a GAN to train on "before and after" images of dental surgeries; so that it can generate "after" pictures for fresh patients. Input images are like these:…
Lakshay Dulani
  • 265
  • 1
  • 2
  • 6
9
votes
3 answers

Difference between Ridge and Linear Regression

From what I have understood, the Ridge Regression is just having the loss function for an optimization problem with the addition of the regularization term (L2 Norm in the case of Ridge). However I am not sure if the loss function can be described…
Panathinaikos
  • 307
  • 1
  • 2
  • 8
9
votes
2 answers

What should be the labels for subword tokens in BERT for NER task?

For any NER task, we need a sequence of words and their corresponding labels. To extract features for these words from BERT, they need to be tokenized into subwords. For example, the word 'infrequent' (with label B-count) will be tokenized into…
PinkBanter
  • 394
  • 3
  • 15
9
votes
1 answer

How fbprophet cross validation works

I am facing some issues to understand how cross_validation function works in fbprophet packages. I have a time series of 68 days (only business days) grouped by 15min and a certain metric : 00:00 5 00:15 2 00:30 10 etc 23:45 26 And I really…
Katy
  • 93
  • 1
  • 1
  • 4
9
votes
1 answer

Understanding dropout and gradient descent

I am looking at how to implement dropout on deep neural networks and found something counter intuitive. In the forward phase dropout mask activations with a random tensor of 1s and 0s to force net to learn the average of the weights. This help the…
emanuele
  • 415
  • 1
  • 4
  • 8
9
votes
5 answers

Any idea about application of deep dream?

Recently Google publicized interesting deep dream. Besides art generation such as http://deepdreamgenerator.com/, do you see any potential applications of deep dream in computer vision or machine learning?
rudky martin
9
votes
1 answer

sklearn - overfitting problem

I'm looking for recommendations as to the best way forward for my current machine learning problem The outline of the problem and what I've done is as follows: I have 900+ trials of EEG data, where each trial is 1 second long. The ground truth is…
Simon
  • 1,071
  • 2
  • 10
  • 28
9
votes
2 answers

Can I use LSTM models to evaluate multiple, independent time series?

Let's say that I would like to predict the temperature tomorrow. I could use the approach whereby I train a model based on a time-series dataset collected from a single location (for example, see this excellent…
9
votes
6 answers

is it possible (and/or logical) to set feature importance for xgboost?

If I understand tree based methods correctly, it would be better for more important features to be toward the top of the tree. Is there a way I can dictate this in xgboost? Similar to how I can assign weights to each sample/row, can I somehow…
user2677285
  • 191
  • 1
  • 3
9
votes
5 answers

How can we extract fields from images?

I am making an document parser which extracts data fields from the documents and store them in a structured way. Each field in my dataset is horizontal which is easy to extract. But the model fails on following type of example - Is there any way…
9
votes
2 answers

Why continuous features are more important than categorical features in decision tree models?

I have both categorical and continuous features in my prediction model and want to select (and rank) most important features. I have converted all categorical variables into dummy variables using one hot encoding (for better interpretation in my…
9
votes
1 answer

Using a GAN discriminator as a standalone classifier

The goal of the discriminator in a GAN is to distinguish between real inputs and inputs synthesized by the generator. Suppose I train a GAN until the generator is good enough to fool the discriminator much of the time. Could I then use the…
rgov
  • 193
  • 3