AI Basics with AK

Season 03 - Introduction to Statistics

Arun Koundinya Parasa

Episode 07 - Central Limit Theorem

Re-Cap of Episode -05

Term Meaning
Sample Space (S) Set of all possible outcomes
Event (E) A subset of the sample space
Outcome A single result from the sample space

Re-Cap of Episode -06

  • What if we roll many times? -> After enough rolls -> There is a pattern

Re-Cap of Episode -06

Distribution Type Example Shape / Key Feature
Uniform Discrete / Continuous Rolling a die, random pick All outcomes equally likely; flat histogram
Normal Continuous Heights, exam scores Bell-shaped; most outcomes near the mean
Binomial Discrete Coin flips (count successes) Counts of successes; hill-shaped at discrete points

Re-Cap of Episode -06

What did we see so far?

A Very Natural Question

  • We saw Uniform distributions
  • We saw Binomial distributions
  • We saw Normal distributions

But real-world data is rarely perfectly normal

So why do statisticians keep using Gaussian assumptions?

This episode answers that exact question.

Important Clarification

  • If data is Uniform, it stays Uniform
  • If data is Skewed, it stays Skewed
  • Taking more samples does not change the population

Uniform data does NOT magically become normal.

This is critical to understand before CLT.

Population vs Sample & Usage in Practice

Reality vs What We Actually Observe

  • Population → the true data-generating process
  • Sample → what we actually observe
  • Statistics → summaries we compute from samples

In practice:

  • We never see the full population
  • We rely on sample-based summaries

Central Tendency Metrics

In real analysis, we rarely use raw data directly.

We use:

  • Mean
  • Sum
  • Average score
  • Average loss
  • etc.;

These are aggregates, not raw observations.

CLT Is NOT About the Data - Instead it is a Statistic

  • CLT does not talk about individual data points

  • CLT talks about the distribution of the mean

Data can be Uniform. Mean of the data behaves differently

This distinction removes most confusion.

CLT Theorem

Central Limit Theorem (Intuition)

If we:

  • Take independent samples

  • From any distribution (Uniform, Binomial, etc.)

  • With finite variance

Then:

The distribution of the sample mean becomes approximately
Normal as sample size increases.

How CLT changes the behavior of aggregates

“The data itself is always Uniform.
What changes here is what we measure the average.”

To Keep in Mind

What CLT Doesn’t say

  • ❌ The population is normal

  • ❌ Individual observations are normal

  • ❌ Small samples are safe

  • ❌ Tails are well-behaved

  • CLT is not a claim about reality.

CLT as a Modeling Comfort

CLT gives us permission:

  • To model means as Gaussian

  • To compute confidence intervals

  • To quantify uncertainty

  • To reason mathematically

Even when underlying structure is different.

Gaussian Appears Everywhere

Gaussian Is the Shape of Aggregation

  • Gaussian is stable under addition

  • Gaussian emerges from summing randomness

  • Gaussian is mathematically tractable

Reality Is Messy & Statistics Is Structured

  • Reality → messy, irregular, asymmetric

  • Statistics → summaries, averages, aggregates

  • CLT explains why: Messy reality → clean statistical behavior

When CLT Can Mislead You?

CLT breaks when:

  • Data is heavy-tailed

  • Strong dependence exists

  • Sample size is small

  • You care about extremes, not averages

Gaussian comfort ≠ guaranteed safety.

Thank You

In Future Seasons we will Discuss Why?

  • Why n=30 is a myth

  • Why CLT fails in business & ML