Most Popular
1500 questions
9
votes
2 answers
Is BERT a language model?
Is BERT a language model in the sense of a function that gets a sentence and returns a probability?
I know its main usage is sentence embedding, but can it also provide this functionality?
Amit Keinan
- 816
- 7
- 19
9
votes
2 answers
XGBoost and Random Forest: ntrees vs. number of boosting rounds vs. n_estimators
So I understand the main difference between Random Forests and GB Methods. Random Forests grow parallel trees and GB Methods grow one tree for each iteration. However, I am confused on the vocab used with scikit's RF regressor and xgboost's…
Jack Armstrong
- 243
- 2
- 6
9
votes
2 answers
How to normalize data without knowing the min and max values?
I have a Lending club dataset from Kaggle;
it contains many different columns: there are for example dummy variables, years, amount of the loan...ect
I want to normalize the data in the training and test set but I have to use the Min and Max of the…
Ghassen Ben Hamida
- 93
- 1
- 4
9
votes
1 answer
Train a GAN on "before and after" images of dental surgeries
I want a GAN to train on "before and after" images of dental surgeries; so that it can generate "after" pictures for fresh patients.
Input images are like these:…
Lakshay Dulani
- 265
- 1
- 2
- 6
9
votes
3 answers
Difference between Ridge and Linear Regression
From what I have understood, the Ridge Regression is just having the loss function for an optimization problem with the addition of the regularization term (L2 Norm in the case of Ridge). However I am not sure if the loss function can be described…
Panathinaikos
- 307
- 1
- 2
- 8
9
votes
2 answers
What should be the labels for subword tokens in BERT for NER task?
For any NER task, we need a sequence of words and their corresponding labels.
To extract features for these words from BERT, they need to be tokenized into subwords.
For example, the word 'infrequent' (with label B-count) will be tokenized into…
PinkBanter
- 394
- 3
- 15
9
votes
1 answer
How fbprophet cross validation works
I am facing some issues to understand how cross_validation function works in fbprophet packages.
I have a time series of 68 days (only business days) grouped by 15min and a certain metric :
00:00 5
00:15 2
00:30 10
etc
23:45 26
And I really…
Katy
- 93
- 1
- 1
- 4
9
votes
1 answer
Understanding dropout and gradient descent
I am looking at how to implement dropout on deep neural networks and found something counter intuitive. In the forward phase dropout mask activations with a random tensor of 1s and 0s to force net to learn the average of the weights. This help the…
emanuele
- 415
- 1
- 4
- 8
9
votes
5 answers
Any idea about application of deep dream?
Recently Google publicized interesting deep dream. Besides art generation such as http://deepdreamgenerator.com/, do you see any potential applications of deep dream in computer vision or machine learning?
rudky martin
9
votes
1 answer
sklearn - overfitting problem
I'm looking for recommendations as to the best way forward for my current machine learning problem
The outline of the problem and what I've done is as follows:
I have 900+ trials of EEG data, where each trial is 1 second long. The ground truth is…
Simon
- 1,071
- 2
- 10
- 28
9
votes
2 answers
Can I use LSTM models to evaluate multiple, independent time series?
Let's say that I would like to predict the temperature tomorrow. I could use the approach whereby I train a model based on a time-series dataset collected from a single location (for example, see this excellent…
CharismaticChromoFauna
- 111
- 1
- 6
9
votes
6 answers
is it possible (and/or logical) to set feature importance for xgboost?
If I understand tree based methods correctly, it would be better for more important features to be toward the top of the tree.
Is there a way I can dictate this in xgboost? Similar to how I can assign weights to each sample/row, can I somehow…
user2677285
- 191
- 1
- 3
9
votes
5 answers
How can we extract fields from images?
I am making an document parser which extracts data fields from the documents and store them in a structured way. Each field in my dataset is horizontal which is easy to extract.
But the model fails on following type of example -
Is there any way…
hR 312
- 91
- 1
- 8
9
votes
2 answers
Why continuous features are more important than categorical features in decision tree models?
I have both categorical and continuous features in my prediction model and want to select (and rank) most important features.
I have converted all categorical variables into dummy variables using one hot encoding (for better interpretation in my…
Shahab Kazemi
- 103
- 1
- 4
9
votes
1 answer
Using a GAN discriminator as a standalone classifier
The goal of the discriminator in a GAN is to distinguish between real inputs and inputs synthesized by the generator.
Suppose I train a GAN until the generator is good enough to fool the discriminator much of the time. Could I then use the…
rgov
- 193
- 3