Season 03 - Introduction to Statistics
| Episode | Test | Degrees of Freedom |
|---|---|---|
| 09 | T-Distribution | df = n − 1 |
| 11 | One-sample t-test | df = n − 1 |
| 12 | Independent t-test | df = n₁ + n₂ − 2 |
| 12 | Paired t-test | df = n − 1 |
| 13 | Chi-Square GoF | df = k − 1 |
| 13 | Chi-Square Independence | df = (r−1)(c−1) |
Imagine 5 friends at a dinner table with 5 fixed seats.
Question: How many friends can choose their seat freely?
4 friends had freedom. 1 did not.
When you have n items and one constraint (the total is fixed) means we have n−1 free choices.
This is the core of df = n − 1.
You have 5 numbers. Their mean is 10.
\[x_1, x_2, x_3, x_4, x_5 \quad \text{with} \quad \bar{X} = 10\]
How many values can you choose freely?
Say you pick: x₁ = 8, x₂ = 12, x₃ = 9, x₄ = 11
Now the sum must equal 50 (since mean = 10, n = 5).
\[x_5 = 50 - (8+12+9+11) = 50 - 40 = \mathbf{10}\]
x₅ is locked in. You had no choice.
4 values were free. 1 was constrained by the mean. df = n − 1 = 4
Why Estimating the Mean Costs One df
Every df formula reflects the same logic — how many free pieces of information remain after estimation.
df = (number of observations) − (number of parameters estimated from the data)
Every time you estimate something — you spend one degree of freedom.
| What You Estimate | df Cost |
|---|---|
| One mean (x̄) | −1 |
| Two means (x̄₁ and x̄₂) | −2 |
| One proportion | −1 |
| k category frequencies (with fixed total) | −1 (not −k) |
| Row totals + column totals in a table | −(r−1) − (c−1) = −r−c+2 |
The more you estimate — the less freedom remains. Less freedom = more uncertainty = heavier tails = harder tests.
❌ “It’s just a number you plug into a formula.”
❌ “Bigger df is always better.”
❌ “df only matters for t-tests.”
❌ “It’s the sample size minus one — always.”
These are all incomplete or wrong.
✅ It measures how much independent information your data carries after accounting for what you’ve already used to estimate.
✅ It controls how conservative your test is.
✅ It shapes which distribution you compare against.
✅ It changes for every test depending on structure.
df is a measure of remaining statistical information.