Season 03 - Introduction to Statistics
| Term | Meaning |
|---|---|
| Sample Space (S) | Set of all possible outcomes |
| Event (E) | A subset of the sample space |
| Outcome | A single result from the sample space |
| Distribution | Type | Example | Shape / Key Feature |
|---|---|---|---|
| Uniform | Discrete / Continuous | Rolling a die, random pick | All outcomes equally likely; flat histogram |
| Normal | Continuous | Heights, exam scores | Bell-shaped; most outcomes near the mean |
| Binomial | Discrete | Coin flips (count successes) | Counts of successes; hill-shaped at discrete points |
But real-world data is rarely perfectly normal
So why do statisticians keep using Gaussian assumptions?
This episode answers that exact question.
Uniform data does NOT magically become normal.
This is critical to understand before CLT.
In practice:
In real analysis, we rarely use raw data directly.
We use:
These are aggregates, not raw observations.
CLT does not talk about individual data points
CLT talks about the distribution of the mean
Data can be Uniform. Mean of the data behaves differently
This distinction removes most confusion.
If we:
Take independent samples
From any distribution (Uniform, Binomial, etc.)
With finite variance
Then:
The distribution of the sample mean becomes approximately
Normal as sample size increases.
“The data itself is always Uniform.
What changes here is what we measure the average.”
❌ The population is normal
❌ Individual observations are normal
❌ Small samples are safe
❌ Tails are well-behaved
CLT is not a claim about reality.
CLT gives us permission:
To model means as Gaussian
To compute confidence intervals
To quantify uncertainty
To reason mathematically
Even when underlying structure is different.
Gaussian is stable under addition
Gaussian emerges from summing randomness
Gaussian is mathematically tractable
Reality → messy, irregular, asymmetric
Statistics → summaries, averages, aggregates
CLT explains why: Messy reality → clean statistical behavior
CLT breaks when:
Data is heavy-tailed
Strong dependence exists
Sample size is small
You care about extremes, not averages
Gaussian comfort ≠ guaranteed safety.
Why n=30 is a myth
Why CLT fails in business & ML