Thursday, February 28, 2019

Online Ad selection: Upper Confidence Bound method and Thompson Sampling method

The post is devoted to selection of the most popular ad to display on a webpage to gather the most clicks, or A/B test. The rate at which the webpage visitors click on an ad is called a conversion rate for the ad. The considered methods allow to make such selection in real time, so the most popular ads are showed as early as possible.

Assume that we have several ads and a place on a webpage to show one of them. We can display them one by one, record all the clicks, analyze the results afterwards and figure out the most popular. But an ad display may be pricey.  It would be more efficient to estimate rates in real time and to display the most popular one as soon as rates can be compared. Especially if an ad leads to a page for a visitor to buy something. There are couple of method for such estimations: Upper Confidence Bound method and Thompson Sampling method.

The first one is based on an confidence interval concept which is studied in a Statistics course and has a good intuitive explanation. Roughly speaking a confidence interval is a numeric interval were our value is supposed to lie with some probability, usually 95%. (The real statistical definition is more technical and means not quite this, but in practice the above explanation is close enough.) During our ad displaying we can compute average rates at each step with corresponding confidence intervals and pick up for next display an ad with a highest upper confidence bound. For the picked ad its mean an interval get recalculated and as result the confidence interval shrinks a little. You can see how it happens in the video below. The intervals are colored by ad legend, and their upper bounds are the right sides of corresponding rectangles. Calculated means are marked by vertical lines.


The method has some drawbacks. It does not take into account that our rates must be between 0 and 1, so initial confidence intervals usually wider than the [0, 1] segment. It means that we waste some time on getting realistic values for our intervals. The worst thing is that if after a start of the displaying we throw in an additional ad then the process takes a lot of time to recover.
Here is another method which is more efficient, Thompson Sampling Method. We assume Beta distributions for each ad rate and instead of computing averages draw a random number in accordance with each distribution for every step. Beta distributions are restricted to [0, 1] segment, so we do not have to wait for our numbers to fall into it. With each data point the beta distribution curve appears closer to the conversion rate mean. There is a picture how it goes for one ad, with a blue vertical line marking a mean and the red line for a random value:
As you see since a random value has more probability to appear where our curve is higher, then it get closer and closer to its mean at each step. With each step an interval where a distribution curve appears above horizontal axis shrinks. You might view the interval as a confidence interval analogue.

When we have several ads we compare the drawn random values at every step and pick up an ad to display with the greatest value. Here how it works for a few ads (I dropped means to make picture more clear):


Plus it accommodates an additional ad in the middle of the process more easily, taking less time to recover.
 Do not hesitate to ask questions and point our mistakes!