Mathematician in Data Science: The Law of Large numbers with Shiny

Even non-statisticians have heard about the Law of Large Numbers. They remember that when numbers are big, then something converges somewhere, and we can use it to our benefit. I recall reading a fictional story about a scientist making huge money thanks to LLN by playing roulette. Unfortunately in reality casino owners make sure to gain the most themselves from the Law.

Let me remind you the gist of LLN. For example, you manage a store and need to estimate how much bread you need to keep in your storage daily to make sure that you turn a profit. You might want to know a weekly average number of visits to your post and what its fluctuations. Or you watch beer quality and would like to learn what is an average alcohol content in your batches, and how consistent is a number from one day to another. By the way, one of the most widely used statistical tests came from Guinness brewery. Here is a fun fact: if you make 4 mistakes in the word "bread", you might get "beer"!

In each case we have a group of entities, called a sample in statistics. We get measurements and compute their average. The question is, how reliable is the average? In particular, can we hope that it is rather close to the actual average? If we can include every case?

Law of Large Numbers tells us that yes, when we have a sufficiently big sample, (and the underlying distribution allows to compute a mean value with standard deviation), then our sample averages tend to be close to the actual average value. The greater is the sample, the more likely they are to turn out close to the coveted reality. Except for distributions which do not comply, but they are few and far between.

During my R studies, I got a following assignment: Illustrate LLN with R plots. We would generate a sample of size 40 from exponential distribution, repeat it a thousand times, compute means for each sample and plot a histogram of the means. Then the sample size was increased, and we would make a second histogram. It was curious to see how the first histogram was slightly skewed, and the second one looked properly bell-shaped. It would be even more interesting to observe many such cases, with different distributions, sample sizes and number of samples, but all histograms will take a lot of room, and comparing plots which are far apart is inconvenient.

When I got a first look at Shiny package and ponder on how I can use it, I realized that it would be perfect to illustrate LLN. You can see a link below. The last example demonstrates a distribution which does not obey LLN.

A Shiny app for Law of Large Numbers

Mathematician in Data Science

Thursday, June 3, 2021

The Law of Large numbers with Shiny

No comments:

Post a Comment