AI Basics with AK

Season 03 - Introduction to Statistics

Arun Koundinya Parasa

Episode 15 - ANOVA: Analysis of Variance

The Problem ANOVA Solves

What If You Have More Than Two Groups?

A pharmaceutical company tests three different drugs on blood pressure.

Drug A: 30 patients Drug B: 30 patients Drug C: 30 patients

Question: Is there a difference in effectiveness across all three drugs?

A t-test compares two groups. We have three.

What do we do?

The Temptation & Wrong Answer

“Just run three t-tests!”

A vs B
A vs C
B vs C

This sounds reasonable. But it creates a serious problem.

Each test has a 5% chance of a Type I error.

Three tests → the error compounds.

\[P(\text{at least one false positive}) = 1 - (0.95)^3 \approx 0.143\]

You’ve inflated your false positive rate from 5% to 14.3% without realising it.

ANOVA’s Answer: One Test for All Groups

ANOVA — Analysis of Variance tests whether any group means differ — in a single test.

H₀: All group means are equal → μ₁ = μ₂ = μ₃ = … = μₖ

H₁: At least one group mean is different

Key point: ANOVA does not tell you which groups differ.

It tells you whether a difference exists somewhere.

For which groups differ → we use post-hoc tests (later in this episode).

One test. One α. No inflation.

The Core Idea: Variance as Evidence

The Intuition

Instead of comparing means directly —

ANOVA compares two types of variance:

Between-group variance: How spread are the group means from the overall mean?

Within-group variance: How spread are individual observations within each group?

If between-group variance is much larger than within-group variance — the groups are genuinely different.

The Signal-to-Noise Ratio

\[F = \frac{\text{Between-group variance}}{\text{Within-group variance}} = \frac{\text{Signal}}{\text{Noise}}\]

Large F → Group means vary more than random noise → Evidence against H₀
Small F ≈ 1 → Groups vary no more than expected by chance → No evidence against H₀

This ratio is called the F-statistic.

It follows an F-distribution under H₀.

Visualizing Between vs Within Variance

Switch scenarios to see how group separation drives F. When groups barely overlap, F is large and we reject H₀.

ANOVA Tells You Someone Differs But Not Who

After rejecting H₀ in ANOVA, we know:

“At least one group mean is different.”

But we don’t know:

Is A different from B?
Is B different from C?
Are all three different?

This is where post-hoc tests come in.

Post-hoc tests make pairwise comparisons after ANOVA — while controlling the family-wise error rate.

The most common: Tukey’s Honest Significant Difference (HSD)

Tukey’s HSD — Post-Hoc Test

Tukey’s HSD computes a minimum difference that two group means must exceed to be considered significantly different.

\[HSD = q \times \sqrt{\frac{MS_W}{n}}\]

q = studentised range statistic (from table)
MS_W = within-group mean square from ANOVA
n = group size (equal groups)

If |x̄ᵢ − x̄ⱼ| > HSD → groups i and j are significantly different.

Assumptions of One-Way ANOVA

ANOVA is powerful — but it rests on three assumptions:

Independence Observations within and across groups must be independent. Each subject appears in only one group.

Normality Data within each group should be approximately normally distributed. ANOVA is robust to mild violations when n is large (CLT again).

Homogeneity of Variance (Homoscedasticity) All groups should have roughly equal variances.