Season 03 - Introduction to Statistics
| Concept | What It Means |
|---|---|
| Type I Error (α) | Rejecting H₀ when it’s actually true — false alarm |
| Type II Error (β) | Missing a real effect — false negative |
| z-test | One sample, σ known |
| t-test | One sample, σ unknown — the real-world default |
Last episode: “Did this one group differ from a known value?” This episode: “Do these two groups actually differ from each other?”
A coffee shop claims wait time = 5 min.
You measured one group of customers.
You compared their mean to a fixed claimed value.
→ One reference point. One group.
Branch A vs Branch B — which is faster?
New drug vs placebo — which works better?
Before training vs after training — did it help?
→ Two groups. Two means. One question:
Is the difference real — or just noise?
H₀: μ₁ = μ₂ (no difference between groups)
H₁: μ₁ ≠ μ₂ (or one-tailed variant)
Test Statistic:
\[t = \frac{\bar{X}_1 - \bar{X}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]
where \(s_p\) is the pooled standard deviation.
\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\]
Degrees of freedom:
\[df = n_1 + n_2 - 2\]
Key assumption:
The two groups have roughly equal population variances.
If that assumption is doubtful → use Welch’s t-test instead.
Scenario: Two teaching methods are tested on different student groups.
At α = 0.05 — is there a significant difference?
H₀: μ_A = μ_B | H₁: μ_A ≠ μ_B (two-tailed)
\[s_p = \sqrt{\frac{11 \times 64 + 11 \times 49}{22}} = \sqrt{56.5} \approx 7.52\]
\[t = \frac{72 - 78}{7.52 \times \sqrt{\frac{1}{12}+\frac{1}{12}}} = \frac{-6}{3.07} \approx -1.95\]
df = 22 | p-value ≈ 0.063
0.063 > 0.05 → Fail to Reject H₀
No significant difference detected at this sample size.
Same hypotheses as independent t-test.
Test Statistic:
\[t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]
No pooling of variances — each group uses its own.
Degrees of freedom are approximated:
\[df \approx \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}\]
This gives a non-integer df — that’s normal.
Why prefer Welch’s?
When variances differ, pooling them distorts the test. Welch’s adjusts for this — at no real cost.
In practice: default to Welch’s unless you have strong reason to pool.
Scenario: Two factories produce the same component. We check consistency:
Variances look very different — use Welch’s. (α = 0.05, two-tailed)
H₀: μ_A = μ_B | H₁: μ_A ≠ μ_B
\[t = \frac{50.2 - 49.7}{\sqrt{\frac{0.16}{10}+\frac{3.24}{10}}} = \frac{0.5}{\sqrt{0.34}} = \frac{0.5}{0.583} \approx 0.858\]
Welch df ≈ 9.8 | p-value ≈ 0.412
0.412 > 0.05 → Fail to Reject H₀
No significant difference in means — but Factory B is far more variable.
Non-parametric — makes no assumptions about the distribution shape.
H₀: The two groups have the same distribution
H₁: One group tends to have higher/lower values
Instead of means — it compares ranks.
Intuition:
If Group 1 consistently has higher ranks → its values tend to be larger → groups are different.
No means. No variances. Just ordering.
Think of it as: “which group wins more head-to-head comparisons?”
Scenario: Customer satisfaction scores (1–10) from two store branches:
Data is ordinal and skewed — Mann-Whitney is appropriate.
Combined ranks: 3(1), 4(2), 5(3.5), 5(3.5), 6(5.5), 6(5.5), 7(7.5), 7(7.5), 8(9), 9(10)
Rank sum Store A: 5.5 + 7.5 + 7.5 + 9 + 10 = 39.5
Rank sum Store B: 1 + 2 + 3.5 + 3.5 + 5.5 = 15.5
U_A = 39.5 − 5(6)/2 = 24.5 | U_B = 5×5 − 24.5 = 0.5
p-value < 0.05 → Reject H₀ ✅
Store A customers are significantly more satisfied.
Before Episode 13 — think through these:
Q1: You measure the blood pressure of 20 patients before and after a medication. Which test is most appropriate? Why?
Q2: Two independent groups. Group 1 has s = 2. Group 2 has s = 15. Independent t-test or Welch’s? Why?
Q3: You have customer ratings (1–5 stars) from two stores. Why might Mann-Whitney be better than a t-test here?