AI Basics with AK

Season 03 - Introduction to Statistics

Arun Koundinya Parasa

Episode 10 - Hypothesis Testing

Recap: What We’ve Covered So Far

Episode	Topic	Key Takeaway
07	Central Limit Theorem	Sample means become Normal as n grows
08	Confidence Intervals	A range of plausible values for the truth
09	T-Distribution	When σ is unknown, tails get heavier

This episode ties everything together.

We’ve been estimating. Now we ask: Can we make a decision?

The Human Problem Behind Hypothesis Testing

A Story We All Relate To

A company claims their new drug reduces blood pressure.

You test it on 40 patients.

The average blood pressure did drop.

But wait — is that because the drug worked?

Or just… random chance?

How do you decide?

This Is Exactly What Hypothesis Testing Solves

We observe something in a sample
We ask: “Could this have happened by chance?”
We use math to answer that question

This is the backbone of:

Medical trials 💊
A/B testing 🖥️
Quality control 🏭
Policy decisions 🏛️

Setting Up the Question

Before running any test, we must define two competing claims:

Null Hypothesis (H₀): The “boring” claim. Nothing happened. No effect. No difference.

Alternative Hypothesis (H₁ or Hₐ): The “interesting” claim. Something changed. There is an effect.

The game:

We start by assuming H₀ is true — and then ask:

“If H₀ were true, how surprising is what we observed?”

If it’s surprising enough → we reject H₀.

Real World Examples

Situation	H₀ (Null)	H₁ (Alternative)
Drug trial	Drug has no effect	Drug reduces blood pressure
Website A/B test	Both versions perform equally	Version B gets more clicks
Coin fairness	Coin is fair (p = 0.5)	Coin is biased (p ≠ 0.5)
Manufacturing	Machine produces 500g on average	Machine is off — mean ≠ 500g

Notice: H₀ always represents the status quo or no change.

The Core Logic: Innocent Until Proven Guilty

Think of It Like a Trial

Defendant = H₀
Evidence = your data
Verdict = reject or fail to reject H₀

We never say H₀ is “proven true.”

We either:

Reject H₀ (evidence is strong enough)
Fail to reject H₀ (not enough evidence)

Important Mindset Shift

❌ We do NOT prove H₁ is true

❌ We do NOT prove H₀ is false

✅ We measure how inconsistent the data is with H₀

The strength of evidence is what drives the conclusion.

Introducing the p-value

The p-value is the probability of seeing results as extreme or more extreme than what we observed — assuming H₀ is true.

“If the null hypothesis were true, how often would we see data like this?”

Small p-value → Our data is very unlikely under H₀ → Evidence against H₀
Large p-value → Our data is quite plausible under H₀ → No strong evidence against H₀

The threshold (significance level α):

Most commonly α = 0.05

If p-value < α → Reject H₀

Visualizing the p-value

The shaded area is the p-value. Smaller shaded area = stronger evidence against H₀.

The 5-Step Framework for Hypothesis Testing

Every hypothesis test follows the same recipe:

Step 1 — State H₀ and H₁ Define your null and alternative hypotheses clearly.

Step 2 — Choose significance level α Usually 0.05. This is your “how surprised must I be?” threshold.

Step 3 — Compute the test statistic Standardize your observed result (z-score or t-score).

Step 4 — Find the p-value How likely is your result if H₀ were true?

Step 5 — Make a decision p-value < α → Reject H₀ p-value ≥ α → Fail to reject H₀

One-Tailed vs Two-Tailed Tests

Two-Tailed Test

Use when H₁ says: “different from”

Example: > H₀: μ = 500g > H₁: μ ≠ 500g

You care if the machine is over or under producing.

Critical region is split — both tails.

α = 0.05 → each tail gets 0.025

One-Tailed Test

Use when H₁ says: “greater than” or “less than”

Example: > H₀: μ = 500g > H₁: μ > 500g

You only care if it’s producing too much.

Critical region is on one side only.

α = 0.05 → entire 0.05 goes to one tail

Visualizing One-Tailed vs Two-Tailed

Worked Example: Coin Fairness Test

The Setup: You suspect a coin is biased. You flip it 100 times and get 62 heads.

Is the coin fair?

Step 1: H₀: p = 0.5 | H₁: p ≠ 0.5 (two-tailed)

Step 2: α = 0.05

Step 3: Compute z-score:

\[z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} = \frac{0.62 - 0.5}{\sqrt{\frac{0.5 \times 0.5}{100}}} = \frac{0.12}{0.05} = 2.4\]

Step 4: p-value = 2 × P(Z > 2.4) ≈ 0.016

Step 5: 0.016 < 0.05 → Reject H₀

Conclusion: Evidence suggests the coin is biased.

Let’s See It Interactively

Select different coin flip outcomes to see how the evidence changes.