Questions tagged [machine-learning-model]

A machine learning model is a simplified representation of a dataset, derived from statistics in the data, used to make predictions. It can represent patterns, behaviours or features within this dataset which have been learnt by the algorithm during training.

A machine learning model is a simplified representation of a dataset, derived from statistics in the data, used to make predictions. It can represent patterns, behaviours or features within this dataset which have been learnt by the algorithm during training.

832 questions
46
votes
8 answers

What would I prefer - an over-fitted model or a less accurate model?

Let's say we have two models trained. And let's say we are looking for good accuracy. The first has an accuracy of 100% on training set and 84% on test set. Clearly over-fitted. The second has an accuracy of 83% on training set and 83% on test set.…
16
votes
2 answers

Why should we use (or not) dropout on the input layer?

People generally avoid using dropout at the input layer itself. But wouldn't it be better to use it? Adding dropout (given that it's randomized it will probably end up acting like another regularizer) should make the model more robust. It will make…
Aditya
  • 2,520
  • 2
  • 17
  • 35
15
votes
3 answers

What are the disadvantages of accuracy?

I have been reading about evaluating a model with accuracy only and I have found some disadvantages. Among them, I read that it equates all errors. How could this problem be solved? Maybe assigning costs to each type of failure? Thank you very much…
11
votes
3 answers

LightGBM - Why Exclusive Feature Bundling (EFB)?

I'm currently studying GBDT and started reading LightGBM's research paper. In section 4. they explain the Exclusive Feature Bundling algorithm, which aims at reducing the number of features by regrouping mutually exclusive features into bundles,…
11
votes
8 answers

I got 100% accuracy on my test set,is there something wrong?

I got 100% accuracy on my test set using decision tree algorithm, but only got 85% accuracy with random forest. Is there something wrong with my model or is decision tree best suited for the dataset provided? Code: from sklearn.model_selection…
11
votes
2 answers

Why should I understand AI architectures?

Why should I understand what is happening deep down in some AI architecture? For example LSTM-BERT- Partial Conv... Architectures like this. Why should I understand what is going on while I can find any model on the Internet or any implementations…
10
votes
2 answers

Optimising for Brier objective function directly gives worse Brier score than optimising with custom objective - what does it tell me?

I am training an XGBoost model and as I care the most about resulting probabilities, not classification itself I have chosen Brier score as a metric for my model, so that probabilities would be well calibrated. I tuned my hyperparameters using…
10
votes
3 answers

Chi-square as evaluation metrics for nonlinear machine learning regression models

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks…
Alex
  • 201
  • 1
  • 3
9
votes
2 answers

How to Use Shap Kernal Explainer with Pipeline models?

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model. My model is given below: pipeline = Pipeline(steps= [ ('imputer', imputer_function()), ('classifier', RandomForestClassifier() …
9
votes
3 answers

Encoding before vs after train test split?

Am new to ML and working on a dataset with lot of categorical variables with high cardinality. I observed that in lot of tutorials for encoding like here, the encoding is applied after the train and test split. Can I check why is it done so? Why…
8
votes
1 answer

How could I estimate slope of lines on a scatter plot?

I have a list of coordinate pairs. To the human eye, they form lines with a constant slope: This is how I generated that image above: import numpy as np np.random.seed(42) slope = 1.2 # all lines have the same slope offsets = np.arange(10) # we…
7
votes
3 answers

Alternatives with better GPU than Google Colab Pro

I am currently running/training MAchine learning models that are very GPU expensive, Google Colab Pro is not giving me enough GPU/RAM Is there any alternatives with better GPU and more RAM than Google Colab Pro??
The Dan
  • 221
  • 1
  • 2
  • 8
7
votes
6 answers

Is it advisable to combine two dataset?

I have two datasets on heart rate of subjects that were recorded in two different places (two different continent to be exact). The two research experiments aimed to find the subjects' emotions based on how much their heart rate change over time. I…
Lapatrie
  • 145
  • 2
  • 9
7
votes
1 answer

Machine learning model for ranking that outputs probabilities

Traditionally ML algorithms for ranking take the features as input and then output a "ranking score" which do not have a natural probabilistic interpretation. For example, suppose we have three laptops: "macbookAir", "macbookPro", "msSurface", and a…
6
votes
3 answers

Which models can handle null values?

Unfortunately trying to google or research null values in machine learning always brings up pages trying to teach you how to impute the values instead, but I'm trying to find models that can handle null values as input. The only one I've found…
1
2 3
55 56