Highest Voted 'distribution' Questions - Data Science Stack Exchange

11

votes

2 answers

how to check the distribution of the training set and testing set are similar

I have been playing the Kaggle Competition and I find there is a situation that the distribution of the training set and testing set are different, so I am wondering how to check the distribution of the training set and testing set are similar. And…

asked Apr 18 '19 at 11:22

Bowen Peng

287
2
6

9

votes

3 answers

xgboost: Is there a way to perform regression on rates/percentages data?

I have a dependent variable, $Y$, that is made up of rates/percentages data, so each value is between $0$ and $1$. I was attracted to the xgboost library because it allows focusing in on specific subsets of the data in training itself, but I am…

regression linear-regression xgboost distribution

asked Aug 06 '19 at 01:25

Coolio2654

300
3
10

7

votes

1 answer

Why do train, test, validation datasets need to have the same distribution?

I've found a lot of martial on how to deal with differently distributed train/test/validation data sets, however I'm struggling to find out why they they need to have the same distribution. Can anyone explain to me in detail why these data sets…

dataset distribution

asked Oct 24 '19 at 22:45

zaza

171
1
3

7

votes

3 answers

Boxplots or violinplots?

This is quite a general question, perhaps somewhat opinion-based. In most papers people use boxplots to visualize a certain distribution, yet violinplots are able to give more information. Violinplots are made by performing a kernel density…

data-mining visualization distribution

asked Feb 20 '18 at 16:57

Archie

883
8
20

7

votes

3 answers

Which outlier detection can detect these outliers?

I have a vector and want to detect outliers in it. The following figure shows the distribution of the vector. Red points are outliers. Blue points are normal points. Yellow points are also normal. I need an outlier detection method (a…

unsupervised-learning anomaly-detection outlier distribution

asked Apr 19 '17 at 10:17

Arkan

453
4
13

7

votes

2 answers

Plotting different values in pandas histogram with different colors

I am working on a dataset. The dataset consists of 16 different features each feature having values belonging to the set (0, 1, 2). In order to check the distribution of values in each column, I used pandas.DataFrame.hist() method which gave me a…

python pandas visualization distribution histogram

asked Nov 10 '16 at 12:09

enterML

3,091
9
28
38

7

votes

1 answer

What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?

As a beginner in MLOps, I was overwhelmed by some confusing definitions. As far as I understand, when we have a classifier or regressor with y = f(X) function: Covariate Shift is changing the distribution of independent variables (X), Label Shift…

data distribution mlops data-drift concept-drift

asked Nov 01 '22 at 11:35

Mohsen Mahmoodzadeh

103
2
6

6

votes

4 answers

Regression: How to deal with positive skewness in continuous target variable

I'm working on a regression problem. My aim is to "learn" the distribution of a continuous target $y$ as good as possible to make predictions. My model looks like: $$y_i=\beta X_i + u_i.$$ $y$ is right skewed (positive skewness) and consists of…

regression predictive-modeling distribution

asked Dec 26 '19 at 13:22

Peter

7,896
5
23
50

6

votes

1 answer

How can I plot/display a dataset or an image distribution?

I want to view a specific image or a dataset's distribution, and see if they are different. Does simply writing something like : # mydataset.shape = (50k,32,32,3) plt.hist(mydataset.reshape(-1)) do the trick? or should I be doing something…

python distribution matplotlib image

asked Feb 17 '19 at 04:41

Hossein

565
6
14

6

votes

1 answer

How to estimate the mutual information numerically?

Suppose I have a sample {$z_i$}$_{i\in[0,N]}$ = {($x_i,y_i$)}$_{i\in[0,N]}$ which commes from a probability distribution $p_z(z)$. How can I use it to estimate the mutual information between X and Y ? $MI(X,Y) = \int_Y \int_X …

numerical mutual-information distribution estimators information-theory

asked Sep 23 '16 at 08:39

patapouf_ai

426
5
11

5

votes

2 answers

Working with Data which is not Normal/Gaussian

What happens if my data/feature is not normal? Can I still use machine learning algorithms to utilize such data for predictions? I noticed in many data sciences courses, there is always a strong assumption of using a normal/Gaussian data. I have…

distribution gaussian

asked Nov 04 '17 at 13:49

Newbie01

53
1
4

4

votes

0 answers

Non-Gaussian like distributions - Classifier of source data fails on target data

I ask you for help on a classification problem (classes are represented by the numbers 0,1 and 2). All features are extracted from time series data (fundamental is sinus shape). I have a source dataset with features, which do not follow a gaussian…

classification statistics feature-selection distribution features

asked Dec 30 '20 at 00:31

deniz

51
1

4

votes

2 answers

evaluation metrics for multiple values per session

I have an application that executes my foo() function several times for each user session. There are 2 alternate algorithms that I can implement as foo() function and my goal is to evaluate them based on execution delay. The number of times foo() is…

statistics accuracy model-evaluations distribution descriptive-statistics

asked Dec 30 '19 at 06:34

sbr

141
2

4

votes

2 answers

Is it possible to train probabilistic model to return several distributions?

I have nonlinear data of function y(x), which is let's say parabolic. At some points of x there are several y's (look at the picture). Is it possible to train a probabilistic model to return several distributions (when needed) i.e. several means…

deep-learning distribution gaussian probabilistic-programming gaussian-process

asked Aug 26 '19 at 17:10

BatyaGG

141
2

4

votes

2 answers

Why do seaborn.dist and pyplot.hist generate two different looking histograms on the same data?

I'm looking at telecom customers data. Two of the variables I'm looking at currently are: Monthly Charges - The total amount charged to the customer monthly. Is Senior Citizen - Whether the customer is a senior citizen. I'm trying to plot two…

python data visualization distribution seaborn

asked Jul 30 '19 at 09:53

helloworld

43
1
5

Questions tagged [distribution]