Most Popular
1500 questions
10
votes
2 answers
Cross Validation in Keras
Suppose I would like to train and test the MNIST dataset in Keras.
The required data can be loaded as follows:
from keras.datasets import mnist
digits_data = mnist.load_data()
Is there any way in keras to split this data into three sets namely:…
enterML
- 3,091
- 9
- 28
- 38
10
votes
1 answer
What size language model can you train on a GPU with x GB of memory?
I'm trying to figure out what size language model I will be able to train on a GPU with a certain amount of memory. Let's for simplicity say that 1 GB = 109 bytes; that means that, for example, on a GPU with 12 GB memory, I can theoretically fit 6…
HelloGoodbye
- 213
- 1
- 2
- 7
10
votes
2 answers
Relu does have 0 gradient by definition, then why gradient vanish is not a problem for x < 0?
By definition, Relu is max(0,f(x)). Then its gradient is defined as:
1 if x > 0 and 0 if x < 0.
Wouldn't this mean the gradient is always 0 (vanishes) when x < 0? Then why do we say Relu doesn't suffer from the gradient vanish problem?
Edamame
- 2,785
- 5
- 25
- 34
10
votes
1 answer
How does class_weights work in RandomForestClassifier
I'm facing a problem with unbalanced classes, and have tried out a couple of methods like over and under sampling. However, my cross validation mean comes out to be only 0.4 and my confusion matrix shows that the prediction and recall is completely…
TdBm
- 423
- 1
- 6
- 15
10
votes
4 answers
What is the difference between residual sum of squares and ordinary least squares?
They look like the same thing to me but I'm not sure.
Update: in retrospect, this was not a very good question. OLS refers to fitting a line to data and RSS is the cost function that OLS uses. It finds the parameters that gives the least residual…
sebastianspiegel
- 931
- 5
- 11
- 17
10
votes
3 answers
Logbook: Machine Learning approaches
In the past, when trying different machine learning algorithms in order to solve a problem, I used to write drown the set of approaches on a notebook, keeping details such as features, feature preprocessing, normalization, algorithms, algorithm…
Jorge
- 211
- 1
- 4
10
votes
1 answer
How to scrape a table from a webpage?
I need to scrape a table off of a webpage and put it into a pandas data frame. But I am not being able to do it. Let me first give you a hint of how the table is encoded into html document.
United States…
user62198
- 1,101
- 4
- 16
- 35
10
votes
2 answers
Benchmark datasets for collaborative filtering
I'd like to test a new algorithm for collaborative filtering. A typical use case is to recommend movies based on the preferences of users similar to the specific user.
What are some common benchmark datasets that researchers often use to test their…
pir
- 790
- 5
- 11
10
votes
3 answers
Are Word2Vec and Doc2Vec both distributional representation or distributed representation?
I have read that distributional representation is based on distributional hypothesis that words occurring in similar context tends to have similar meanings.
Word2Vec and Doc2Vec both are modeled according to this hypothesis. But, in the original…
chmodsss
- 1,974
- 2
- 19
- 37
10
votes
3 answers
Libraries for Online Machine Learning
I am looking for packages (either in python, R, or a standalone package) to perform online learning to predict stock data.
I have found and read about Vowpal Wabbit (https://github.com/JohnLangford/vowpal_wabbit/wiki),
which seems to be quite…
mike1886
- 933
- 9
- 17
10
votes
1 answer
How do you calculate how dense or sparse a dataset is?
I'm looking deeper into collaborative filtering. One really interesting paper is "A Comparative Study of Collaborative Filtering Algorithms" http://arxiv.org/pdf/1205.3193.pdf
In order to select which CF algorithm should be used the paper refers to…
djones
- 203
- 1
- 2
- 4
10
votes
1 answer
Libraries for (label propagation algorithms/frequent subgraph mining) for graphs in R
General description of the problem
I have a graph where some vertices are labeled with a type with 3 or 4 possible values. For the other vertices, the type is unknown.
My goal is to use the graph to predict the type for vertices that are…
Loiisso
- 155
- 8
10
votes
1 answer
BERT vs GPT architectural, conceptual and implemetational differences
In the BERT paper, I learnt that BERT is encoder-only model, that is it involves only transformer encoder blocks.
In the GPT paper, I learnt that GPT is decoder-only model, that is it involves only transformer decoder blocks.
I was guessing whats…
Rnj
- 245
- 2
- 9
10
votes
2 answers
implementing temporal difference in chess
I have been developing a chess program which makes use of alpha-beta pruning algorithm and an evaluation function that evaluates positions using the following features namely material, kingsafety, mobility, pawn-structure and trapped pieces etc.....…
user2890137
- 101
- 2
10
votes
2 answers
ReLU vs Leaky ReLU vs ELU with pros and cons
I am unable to understand when to use ReLU, Leaky ReLU and ELU.
How do they compare to other activation functions(like the sigmoid and the tanh) and their pros and cons.
Ayazzia01
- 113
- 1
- 1
- 6