Empirical Proof of the Central Limit Theorem in MATLAB

The Central Limit Theorem (CLT) is a fundamental theorem in probability and statistics which tells us that the sampling distribution of the mean is asymptotically Gaussian as long as the sample size is sufficiently large, no matter what distribution is followed by the population. The sampling distribution of the mean has a mean equal to the population mean (μ) and variance given by σ2/N, where σ2 is the population variance and N is the sample size. Generally, the sample is considered sufficiently large for sample size greater than or equal to 30 (N ≥ 30). The variance of the sampling distribution of the mean is reduced by the factor N as the number of samples increases.

The ab initio proof of the CLT is rather complicated and requires strong knowledge of the underpinnings of probability theory1. However, the CLT can be explored and understood empirically, through observations. Here is a MATLAB code I wrote to explore the CLT in a graduate class I am teaching on Data Analysis for the Earth Sciences. Sampling distribution of the mean with various sample size. Population distribution is Rayleigh.

1Stark & Woods (2001) – Probability and Random Processes with Applications to Signal Processing (3rd Edition) 