AI Basics with AK

Season 03 - Introduction to Statistics

Arun Koundinya Parasa

Episode 10 - Hypothesis Testing

Recap: What We’ve Covered So Far

Episode Topic Key Takeaway
07 Central Limit Theorem Sample means become Normal as n grows
08 Confidence Intervals A range of plausible values for the truth
09 T-Distribution When σ is unknown, tails get heavier

This episode ties everything together.

We’ve been estimating. Now we ask: Can we make a decision?

The Human Problem Behind Hypothesis Testing

A Story We All Relate To

A company claims their new drug reduces blood pressure.

You test it on 40 patients.

The average blood pressure did drop.

But wait — is that because the drug worked?

Or just… random chance?

How do you decide?

This Is Exactly What Hypothesis Testing Solves

  • We observe something in a sample
  • We ask: “Could this have happened by chance?”
  • We use math to answer that question

This is the backbone of:

  • Medical trials 💊
  • A/B testing 🖥️
  • Quality control 🏭
  • Policy decisions 🏛️

Setting Up the Question

Before running any test, we must define two competing claims:

Null Hypothesis (H₀): The “boring” claim. Nothing happened. No effect. No difference.

Alternative Hypothesis (H₁ or Hₐ): The “interesting” claim. Something changed. There is an effect.

The game:

We start by assuming H₀ is true — and then ask:

“If H₀ were true, how surprising is what we observed?”

If it’s surprising enough → we reject H₀.

Real World Examples

Situation H₀ (Null) H₁ (Alternative)
Drug trial Drug has no effect Drug reduces blood pressure
Website A/B test Both versions perform equally Version B gets more clicks
Coin fairness Coin is fair (p = 0.5) Coin is biased (p ≠ 0.5)
Manufacturing Machine produces 500g on average Machine is off — mean ≠ 500g

Notice: H₀ always represents the status quo or no change.

The Core Logic: Innocent Until Proven Guilty

Think of It Like a Trial

  • Defendant = H₀
  • Evidence = your data
  • Verdict = reject or fail to reject H₀

We never say H₀ is “proven true.”

We either:

  • Reject H₀ (evidence is strong enough)
  • Fail to reject H₀ (not enough evidence)

Important Mindset Shift

❌ We do NOT prove H₁ is true

❌ We do NOT prove H₀ is false

✅ We measure how inconsistent the data is with H₀

The strength of evidence is what drives the conclusion.

Introducing the p-value

The p-value is the probability of seeing results as extreme or more extreme than what we observed — assuming H₀ is true.

“If the null hypothesis were true, how often would we see data like this?”

  • Small p-value → Our data is very unlikely under H₀ → Evidence against H₀
  • Large p-value → Our data is quite plausible under H₀ → No strong evidence against H₀

The threshold (significance level α):

Most commonly α = 0.05

If p-value < α → Reject H₀

Visualizing the p-value

The shaded area is the p-value. Smaller shaded area = stronger evidence against H₀.

The 5-Step Framework for Hypothesis Testing

Every hypothesis test follows the same recipe:

Step 1 — State H₀ and H₁ Define your null and alternative hypotheses clearly.

Step 2 — Choose significance level α Usually 0.05. This is your “how surprised must I be?” threshold.

Step 3 — Compute the test statistic Standardize your observed result (z-score or t-score).

Step 4 — Find the p-value How likely is your result if H₀ were true?

Step 5 — Make a decision p-value < α → Reject H₀ p-value ≥ α → Fail to reject H₀

One-Tailed vs Two-Tailed Tests

Two-Tailed Test

Use when H₁ says: “different from”

Example: > H₀: μ = 500g > H₁: μ 500g

You care if the machine is over or under producing.

Critical region is split — both tails.

α = 0.05 → each tail gets 0.025

One-Tailed Test

Use when H₁ says: “greater than” or “less than”

Example: > H₀: μ = 500g > H₁: μ > 500g

You only care if it’s producing too much.

Critical region is on one side only.

α = 0.05 → entire 0.05 goes to one tail

Visualizing One-Tailed vs Two-Tailed

Worked Example: Coin Fairness Test

The Setup: You suspect a coin is biased. You flip it 100 times and get 62 heads.

Is the coin fair?

Step 1: H₀: p = 0.5 | H₁: p ≠ 0.5 (two-tailed)

Step 2: α = 0.05

Step 3: Compute z-score:

\[z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} = \frac{0.62 - 0.5}{\sqrt{\frac{0.5 \times 0.5}{100}}} = \frac{0.12}{0.05} = 2.4\]

Step 4: p-value = 2 × P(Z > 2.4) ≈ 0.016

Step 5: 0.016 < 0.05 → Reject H₀

Conclusion: Evidence suggests the coin is biased.

Let’s See It Interactively

Select different coin flip outcomes to see how the evidence changes.

Thank You