AI Basics with AK

Season 03 - Introduction to Statistics

Arun Koundinya Parasa

Episode 13 - Chi-Square Tests

Recap: Episode 10 till 12

  • What is Hypothesis Testing?
  • The Null and Alternative Hypotheses
  • Type I and Type II Errors
  • p-values and Significance Levels
  • t-tests for comparing means

This is all great for numerical data, how can we compare when there is categorical data?

Our Earlier Hypothesis questions was like: “Is the average height different between groups?”

A Different Kind of Data

Numeric Data (What We’ve Tested)

  • Heights, weights, test scores
  • Blood pressure, wait times
  • We compute means, standard deviations
  • We ask: “Is the average different?”

These are continuous or discrete numbers.

Categorical Data (What Chi-Square Tests)

  • Colours, genders, political parties
  • Yes/No responses, product preferences
  • We count how many fall in each category
  • We ask: “Are the counts what we expected?”

These are frequencies and proportions.

You can’t take the mean of “Red, Blue, Green.” But you can count them.

Two Questions Chi-Square Answers

Test 1 — Goodness of Fit

“Does the distribution of my data match an expected pattern?”

Example: Is a die fair? Do customers prefer all flavours equally?

You have one categorical variable. You compare observed counts to expected counts.


Test 2 — Test of Independence

“Are two categorical variables related to each other?”

Example: Is gender related to product preference? Is smoking related to disease status?

You have two categorical variables. You test whether knowing one tells you anything about the other.

The Chi-Square Distribution

The chi-square distribution is right-skewed and always positive. As df increases, it shifts right and becomes more symmetric. Our test statistic must exceed the critical value to reject H₀.

The Core Idea: Observed vs Expected

The Question We Always Ask

If the null hypothesis were true —

what counts would we expect to see?

Then we compare those expected counts to what we actually observed.

If the gap is small → data is consistent with H₀

If the gap is large → something is going on

The Chi-Square Statistic

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

  • O = Observed count in each cell
  • E = Expected count under H₀

Each cell contributes to the total.

Large \(\chi^2\) → observed and expected are far apart → evidence against H₀

Small \(\chi^2\) → data fits the expected pattern → no evidence against H₀

Test 1 — Goodness of Fit

Setup: You roll a die 60 times. If the die is fair, you expect each face 10 times.

Face Observed (O) Expected (E) (O−E)²/E
1 8 10 0.40
2 12 10 0.40
3 7 10 0.90
4 14 10 1.60
5 11 10 0.10
6 8 10 0.40

\[\chi^2 = 0.40+0.40+0.90+1.60+0.10+0.40 = 3.80\]

df = k − 1 = 5 | Critical value at α=0.05 → 11.07

3.80 < 11.07 → Fail to Reject H₀ ✅

No evidence the die is unfair.

Test 2 — Chi-Square Test of Independence

The Question: Is there a relationship between two categorical variables?

Setup: A survey of 200 customers asks: “Do you prefer Product A, B, or C?” — split by gender.

Product A Product B Product C Row Total
Male 40 35 25 100
Female 30 45 25 100
Col Total 70 80 50 200

H₀: Gender and product preference are independent

H₁: Gender and product preference are associated

Computing Expected Counts

For each cell, the expected count under independence is:

\[E_{ij} = \frac{\text{Row Total}_i \times \text{Column Total}_j}{\text{Grand Total}}\]

Product A Product B Product C
Male (100×70)/200 = 35 (100×80)/200 = 40 (100×50)/200 = 25
Female (100×70)/200 = 35 (100×80)/200 = 40 (100×50)/200 = 25

\[\chi^2 = \frac{(40-35)^2}{35}+\frac{(35-40)^2}{40}+\frac{(25-25)^2}{25}+\frac{(30-35)^2}{35}+\frac{(45-40)^2}{40}+\frac{(25-25)^2}{25}\]

\[= 0.714 + 0.625 + 0 + 0.714 + 0.625 + 0 = \mathbf{2.678}\]

df = (rows−1)(cols−1) = 1×2 = 2 | Critical value = 5.99

2.678 < 5.99 → Fail to Reject H₀ ✅ No significant association.

Interactive: Test of Independence

Switch between weak, moderate, and strong associations. Watch how χ² and the p-value respond as the gap between groups widens.

Real World Applications

Goodness of Fit In Practice

Marketing: Do customers buy all products equally? Or is one product dominating?

Genetics: Do offspring ratios match Mendel’s predicted 3:1 ratio?

Operations: Are defects distributed equally across shifts? Or is one shift producing more errors?

Independence Test In Practice

Healthcare: Is smoking status associated with lung disease?

HR analytics: Is employee attrition related to department?

Retail: Is purchase behaviour related to customer age group?

A/B Testing: Is click-through rate independent of the ad version shown?

Thank You