Chi-square test results interpretation

Question

I am comparing with Chi Square the distributions of two categorical variables. Both have the same number of classes. After counting each class per variable, I obtain very similar counts but the p-value result of the chi-square test is 0 - rejecting the null hypothesis. I am not sure what I am missing.

Here is the code:

    import numpy as np
from scipy.stats import chi2_contingency
var1_arr = np.array([361837, 94360, 1533308]) # counts per class for var 1
var2_arr = np.array([355572, 93285, 1544745]) # counts per class for var 2
observed_counts = np.vstack((var1_arr,var2_arr))
# Given class counts
observed_counts = np.array([[361837, 94360.67, 1533308.67],
[355572, 93285, 1544745]])
Calculate expected frequencies
N = observed_counts.sum()
expected_counts = (observed_counts / N) * N
Perform chi-square test
chi2, p_value, dof, expected = chi2_contingency(observed_counts)
print(f"Chi-Square Statistic: {chi2:.4f}")
print(f"P-value: {p_value:.4f}")

The result is: Chi-Square Statistic: 99.1516 P-value: 0.0000

score 1 · Answer 1 · answered Apr 16 '24 at 14:48

You have a huge number of observations. Consequently, the test is highly sensitive to small differences. The test has considerable statistical power to detect these.

You are within your rights to say that these differences lack practical significance, but a tiny p-value is not surprising.

Significance test for large sample sizes

Chi-square test results interpretation

# Given class counts

observed_counts = np.array([[361837, 94360.67, 1533308.67],

[355572, 93285, 1544745]])

Calculate expected frequencies

Perform chi-square test

1 Answers1