Questions tagged [anova]

12 questions
3
votes
1 answer

When should mutual information be used for feature selection over other feature selection methods like correlation, ANOVA , etc?

I have a data set with categorical and continuous/ordinal explanatory variables and continuous target variable. I tried to filter features using one-way ANOVA for categorical variables and using Spearman's correlation coefficient for…
2
votes
1 answer

Score of ANOVA in selected features

I selected features using ANOVA (because I have Numerical data as input and Categorical data as target): anova = SelectKBest(score_func=f_classif, k='all') anova.fit(X_train, y_train.values.argmax(1)) # y_train.values.argmax(1) because I already…
Mimi
  • 65
  • 8
2
votes
1 answer

What conclusion can I get when the variable is influenced by other but there isn't any correlation?

I am doing an analytic exploratory analysis. If the target is a continuous variable and the attributes are all categorical (discrete values), in order to know if exist any influence on the target from the each attribute I am doing the ANOVA-test…
2
votes
0 answers

ANOVA for mean difference b/w groups abnormal distribution, large sample size

I have $10$ groups - sample size $n>700$: resampled to $710$ for ANOVA - visually these distributions are not normal, slight bimodlity in the sets. I ran an ANOVA, and got a $P\approx 0.089$. It coincides with what I expected from the histograms,…
2
votes
1 answer

ANOVA procedure - Regression

I am new to regression. Can someone explain to me how the regression sum of squares shows the explained variation? Essentially, why is it (y hat - y bar)? I hope i'm explaining my question accurately. i tried drawing a graph with the regression…
Michael
  • 21
  • 1
1
vote
0 answers

Levene test for equal variance

I would like to run one-way ANOVA test on my data. I saw that one of several assumptions for one-way ANOVA is that there needs to be homogeneity of variances. I have run the test for different data-sets. I find sometimes my p-values are larger than…
Reut
  • 299
  • 3
  • 15
1
vote
1 answer

Question on ANOVA and Correlation/Association

I've been working on examining statistical relationships between variable: Pearsons, Spearman's for continuous variables Kendall's Tau, Cramer's V for ordinal/nominal variables. I know there's many more ways. Recently I read about ANOVA and…
rocksNwaves
  • 309
  • 1
  • 11
1
vote
2 answers

Are Chi-square and ANOVA (f_classif) to select best features?

I have a binary classification problem (target 0 o 1), I have both variables continuous and categorical as features. I understood that about Chi-square i can use only categorical features to evaluate them. What about ANOVA (f_classif)? It's the…
0
votes
1 answer

pass variable length argument to mstats.kruskalwallis

I am trying to run kruskawallis test on multiple columns of my data for that i wrote an function var=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'] def kruskawallis_test(column): …
Ayush Ranjan
  • 411
  • 1
  • 4
  • 15
0
votes
1 answer

What does it mean to have 1 degree of freedom in ANOVA test?

So I used python to run multi-factorial ANOVA analysis on a data set. I first used a ols.fit() and then the anova_lm function. I realized for the variables I am analyzing their degree of freedom is 1. Does that mean only 1 value out of my data is…
0
votes
0 answers

Statistical significance on aggregate data to show that the groups are different?

I am working with performance data for three groups for each region. The denominator for the groups is the number of people who are identified as low performers. For region A, Group-1 low performer %= 40% , group-2= 30% , group-3 low performer= 30%.…
user728148
  • 21
  • 1
  • 3
0
votes
2 answers

What model should I use to predict monthly sales by products?

I am trying to predict monthly sales by product based on a plethora of variables. There are 4 predictors. One is categorical (month) and the other three are numerical. One of the variables is just part sales. The data I am trying to predict is…