Confidence in retention analytics

Confidence in retention analytics#

One of the game producers requested I provide some confidence in retention analytics; he rightly questioned how reliable it was to interpret the differences in, say, Day 7 retention from build to build, especially when patches were often out for a small amount of time. One way to mitigate this issue was to provide analyses by minor change, as well, via a selector. Another way to provide confidence is to use confidence intervals for the estimators, that allow for comparisons between samples of different sample sizes.

A confidence interval provides an estimated lower and upper bound for the reported estimate, assuming sample size > 30, jusitifying the Central Limit Theorem,

\[ \overline X \pm z_{1-\alpha} \times SE \]

where \(\overline X\) denotes the estimator, \(z\) is the quantile of the standard normal distribution evaluated at \(1 - \alpha\), with \(\alpha\) denoting the desired confidence, \(SE\) the standard error of the estimator, dependent on distribution, and \(s\) being the standard deviation. \(\alpha\) is almost universally designated 95%, for better or worse, and in terms of applied science, there is rarely time to explore the methodological limitations of this assumption.

For build and time, we will use two different estimators, which each have their own standard error.

Confidence in retention by build#

The standard analysis for player retention in video games is to examine how many new players for the build returned on Day 1, Day 3, etc. It is standard to report

\[ \frac{\text{number of new users for build on Day d}} {\text{number of users on Day 0 for build}}, \]

however when the number of Day 0 users changes from build to build, crucially because of variable time in the build being available to install, this proportion can be hard to compare.

This is a proportion, so we use a binomial distribution, wherein we are estimating \(n_d\), the number of people returning on Day d, out of \(n_0\) the number of players returning on day 0, that is,

\[ \hat p := \frac{n_d}{n_0}. \]

The 95% confidence interval for the true proportion \(p\) is given by,

\[ p \approx \hat p \pm z_{1 - 0.95}\sqrt{\frac{\hat p(1 - \hat p)}{n_0}} \]

where \(z_\alpha\) is the quantile of the standard normal distribution evaluated at \(\alpha\) confidence.

Confidence in retention over time#

Over time, however it is necessary to weight both the estimator and the standard error, as for each observation, \(i\) th in the time period may have its own sample size. When you have proportions from different samples with unequal sizes, you can calculate a weighted average to get an overall estimate. The weights are typically the sample sizes, as larger samples give more precise estimates.

The weighted proportion \(\hat p_w\) is calculated as:

\[ \hat p_w = \frac{\sum_{i=1}^{k} n_{0i} \hat p_i}{\sum_{i=1}^{k} n_{0i}} \]

where \(n_{0i}\) is the size of the \(i\) th sample and \(\hat p_i\) is the proportion of the \(i\) th sample.

The variance of the weighted proportion is:

\[ Var(\hat p_w) = \frac{\sum_{i=1}^{k} n_{0i} \hat p_i (1 - \hat p_i)}{(\sum_{i=1}^{k} n_{0i})^2} \]

The standard error is the square root of the variance:

\[ SE(\hat p_w) = \sqrt{Var(\hat p_w)} \]

The 95% confidence interval for the true proportion \(p\) is then given by:

\[ p \approx \hat p_w \pm z_{1 - 0.95} SE(\hat p_w) \]

where \(z_{1 - 0.95}\) is the quantile of the standard normal distribution evaluated at 95% confidence.

This gives you a weighted estimate of the p. Although the calculations are relatively trivial, these get cumbersome to describe with words.