Most Popular

1500 questions
10
votes
2 answers

Cross Validation in Keras

Suppose I would like to train and test the MNIST dataset in Keras. The required data can be loaded as follows: from keras.datasets import mnist digits_data = mnist.load_data() Is there any way in keras to split this data into three sets namely:…
enterML
  • 3,091
  • 9
  • 28
  • 38
10
votes
1 answer

What size language model can you train on a GPU with x GB of memory?

I'm trying to figure out what size language model I will be able to train on a GPU with a certain amount of memory. Let's for simplicity say that 1 GB = 109 bytes; that means that, for example, on a GPU with 12 GB memory, I can theoretically fit 6…
HelloGoodbye
  • 213
  • 1
  • 2
  • 7
10
votes
2 answers

Relu does have 0 gradient by definition, then why gradient vanish is not a problem for x < 0?

By definition, Relu is max(0,f(x)). Then its gradient is defined as: 1 if x > 0 and 0 if x < 0. Wouldn't this mean the gradient is always 0 (vanishes) when x < 0? Then why do we say Relu doesn't suffer from the gradient vanish problem?
Edamame
  • 2,785
  • 5
  • 25
  • 34
10
votes
1 answer

How does class_weights work in RandomForestClassifier

I'm facing a problem with unbalanced classes, and have tried out a couple of methods like over and under sampling. However, my cross validation mean comes out to be only 0.4 and my confusion matrix shows that the prediction and recall is completely…
TdBm
  • 423
  • 1
  • 6
  • 15
10
votes
4 answers

What is the difference between residual sum of squares and ordinary least squares?

They look like the same thing to me but I'm not sure. Update: in retrospect, this was not a very good question. OLS refers to fitting a line to data and RSS is the cost function that OLS uses. It finds the parameters that gives the least residual…
sebastianspiegel
  • 931
  • 5
  • 11
  • 17
10
votes
3 answers

Logbook: Machine Learning approaches

In the past, when trying different machine learning algorithms in order to solve a problem, I used to write drown the set of approaches on a notebook, keeping details such as features, feature preprocessing, normalization, algorithms, algorithm…
Jorge
  • 211
  • 1
  • 4
10
votes
1 answer

How to scrape a table from a webpage?

I need to scrape a table off of a webpage and put it into a pandas data frame. But I am not being able to do it. Let me first give you a hint of how the table is encoded into html document. United States…
user62198
  • 1,101
  • 4
  • 16
  • 35
10
votes
2 answers

Benchmark datasets for collaborative filtering

I'd like to test a new algorithm for collaborative filtering. A typical use case is to recommend movies based on the preferences of users similar to the specific user. What are some common benchmark datasets that researchers often use to test their…
pir
  • 790
  • 5
  • 11
10
votes
3 answers

Are Word2Vec and Doc2Vec both distributional representation or distributed representation?

I have read that distributional representation is based on distributional hypothesis that words occurring in similar context tends to have similar meanings. Word2Vec and Doc2Vec both are modeled according to this hypothesis. But, in the original…
chmodsss
  • 1,974
  • 2
  • 19
  • 37
10
votes
3 answers

Libraries for Online Machine Learning

I am looking for packages (either in python, R, or a standalone package) to perform online learning to predict stock data. I have found and read about Vowpal Wabbit (https://github.com/JohnLangford/vowpal_wabbit/wiki), which seems to be quite…
mike1886
  • 933
  • 9
  • 17
10
votes
1 answer

How do you calculate how dense or sparse a dataset is?

I'm looking deeper into collaborative filtering. One really interesting paper is "A Comparative Study of Collaborative Filtering Algorithms" http://arxiv.org/pdf/1205.3193.pdf In order to select which CF algorithm should be used the paper refers to…
djones
  • 203
  • 1
  • 2
  • 4
10
votes
1 answer

Libraries for (label propagation algorithms/frequent subgraph mining) for graphs in R

General description of the problem I have a graph where some vertices are labeled with a type with 3 or 4 possible values. For the other vertices, the type is unknown. My goal is to use the graph to predict the type for vertices that are…
Loiisso
  • 155
  • 8
10
votes
1 answer

BERT vs GPT architectural, conceptual and implemetational differences

In the BERT paper, I learnt that BERT is encoder-only model, that is it involves only transformer encoder blocks. In the GPT paper, I learnt that GPT is decoder-only model, that is it involves only transformer decoder blocks. I was guessing whats…
Rnj
  • 245
  • 2
  • 9
10
votes
2 answers

implementing temporal difference in chess

I have been developing a chess program which makes use of alpha-beta pruning algorithm and an evaluation function that evaluates positions using the following features namely material, kingsafety, mobility, pawn-structure and trapped pieces etc.....…
10
votes
2 answers

ReLU vs Leaky ReLU vs ELU with pros and cons

I am unable to understand when to use ReLU, Leaky ReLU and ELU. How do they compare to other activation functions(like the sigmoid and the tanh) and their pros and cons.