Season 03 - Introduction to Statistics
For large sample sizes, the sampling distribution of the sample mean \(\bar{X}\) is approximately normal:
\[ \bar{X} \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right) \]
When the population standard deviation \(\sigma\) is known, the normal (Z) distribution guides inference.
But what if \(\sigma\) is unknown and the sample size is small?
In practice, \(\sigma\) is rarely known.
Instead, we estimate it using the sample standard deviation \(s\).
When \(\sigma\) is replaced by \(s\), the standardized statistic becomes:
\[ t = \frac{\bar{X} - \mu}{s / \sqrt{n}} \]
Unlike the Z-statistic, this quantity no longer follows the standard normal distribution.
Instead, it follows a t-distribution with
\[ df = n - 1 \]
degrees of freedom.
The loss of one degree of freedom reflects the estimation of the sample mean in computing \(s\).
Why do we lose one degree of freedom?
Hint: When computing \(s\), how many values are free to vary once \(\bar{X}\) is fixed?
The t-distribution is symmetric and centered at 0, much like the normal distribution.
However, because \(s\) varies from sample to sample, the resulting statistic has greater variability.
This produces heavier tails compared to the normal distribution.
The shape depends on the degrees of freedom:
As \(df \to \infty\), the t-distribution converges to the standard normal distribution.
To see this behavior clearly, we compare:
Observe how smaller degrees of freedom produce thicker tails, reflecting increased uncertainty when estimating \(\sigma\) from small samples.
Before we look at the plot:
Notice how the t-distributions with lower df have broader, heavier tails.
If:
Which distribution should guide inference?
When \(\sigma\) is known → use the Z-statistic.
When \(\sigma\) is unknown and the sample size is small → use the t-statistic:
\[ t = \frac{\bar{X} - \mu}{s / \sqrt{n}} \]
The t-distribution adjusts for additional uncertainty introduced by estimating \(\sigma\).
As sample size increases, this adjustment becomes negligible — and the t-distribution approaches the normal distribution.