AI Basics with AK

Season 03 - Introduction to Statistics

Arun Koundinya Parasa

Episode 13 - Chi-Square Tests

Recap: Episode 10 till 12

What is Hypothesis Testing?
The Null and Alternative Hypotheses
Type I and Type II Errors
p-values and Significance Levels
t-tests for comparing means

This is all great for numerical data, how can we compare when there is categorical data?

Our Earlier Hypothesis questions was like: “Is the average height different between groups?”

A Different Kind of Data

Numeric Data (What We’ve Tested)

Heights, weights, test scores
Blood pressure, wait times
We compute means, standard deviations
We ask: “Is the average different?”

These are continuous or discrete numbers.

Categorical Data (What Chi-Square Tests)

Colours, genders, political parties
Yes/No responses, product preferences
We count how many fall in each category
We ask: “Are the counts what we expected?”

These are frequencies and proportions.

You can’t take the mean of “Red, Blue, Green.” But you can count them.

Two Questions Chi-Square Answers

Test 1 — Goodness of Fit

“Does the distribution of my data match an expected pattern?”

Example: Is a die fair? Do customers prefer all flavours equally?

You have one categorical variable. You compare observed counts to expected counts.

Test 2 — Test of Independence

“Are two categorical variables related to each other?”

Example: Is gender related to product preference? Is smoking related to disease status?

You have two categorical variables. You test whether knowing one tells you anything about the other.

The Chi-Square Distribution

The chi-square distribution is right-skewed and always positive. As df increases, it shifts right and becomes more symmetric. Our test statistic must exceed the critical value to reject H₀.

The Core Idea: Observed vs Expected

The Question We Always Ask

If the null hypothesis were true —

what counts would we expect to see?

Then we compare those expected counts to what we actually observed.

If the gap is small → data is consistent with H₀

If the gap is large → something is going on

The Chi-Square Statistic

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

O = Observed count in each cell
E = Expected count under H₀

Each cell contributes to the total.

Large \(\chi^2\) → observed and expected are far apart → evidence against H₀

Small \(\chi^2\) → data fits the expected pattern → no evidence against H₀

Test 1 — Goodness of Fit

Setup: You roll a die 60 times. If the die is fair, you expect each face 10 times.

Face	Observed (O)	Expected (E)	(O−E)²/E
1	8	10	0.40
2	12	10	0.40
3	7	10	0.90
4	14	10	1.60
5	11	10	0.10
6	8	10	0.40

\[\chi^2 = 0.40+0.40+0.90+1.60+0.10+0.40 = 3.80\]

df = k − 1 = 5 | Critical value at α=0.05 → 11.07

3.80 < 11.07 → Fail to Reject H₀ ✅

No evidence the die is unfair.

Test 2 — Chi-Square Test of Independence

The Question: Is there a relationship between two categorical variables?

Setup: A survey of 200 customers asks: “Do you prefer Product A, B, or C?” — split by gender.

	Product A	Product B	Product C	Row Total
Male	40	35	25	100
Female	30	45	25	100
Col Total	70	80	50	200

H₀: Gender and product preference are independent

H₁: Gender and product preference are associated

Computing Expected Counts

For each cell, the expected count under independence is:

\[E_{ij} = \frac{\text{Row Total}_i \times \text{Column Total}_j}{\text{Grand Total}}\]

	Product A	Product B	Product C
Male	(100×70)/200 = 35	(100×80)/200 = 40	(100×50)/200 = 25
Female	(100×70)/200 = 35	(100×80)/200 = 40	(100×50)/200 = 25

\[\chi^2 = \frac{(40-35)^2}{35}+\frac{(35-40)^2}{40}+\frac{(25-25)^2}{25}+\frac{(30-35)^2}{35}+\frac{(45-40)^2}{40}+\frac{(25-25)^2}{25}\]

\[= 0.714 + 0.625 + 0 + 0.714 + 0.625 + 0 = \mathbf{2.678}\]

df = (rows−1)(cols−1) = 1×2 = 2 | Critical value = 5.99

2.678 < 5.99 → Fail to Reject H₀ ✅ No significant association.

Interactive: Test of Independence

Switch between weak, moderate, and strong associations. Watch how χ² and the p-value respond as the gap between groups widens.

Real World Applications

Goodness of Fit In Practice

Marketing: Do customers buy all products equally? Or is one product dominating?

Genetics: Do offspring ratios match Mendel’s predicted 3:1 ratio?

Operations: Are defects distributed equally across shifts? Or is one shift producing more errors?

Independence Test In Practice

Healthcare: Is smoking status associated with lung disease?

HR analytics: Is employee attrition related to department?

Retail: Is purchase behaviour related to customer age group?

A/B Testing: Is click-through rate independent of the ad version shown?