Experiment results are updated once a week until the experiment ends. Experiment results detail how well each version of content performed with customers along with the probability that the winning content is better. You can access your results by clicking on the experiment name in the experiments dashboard.
Based on the data collected during the experiment, we calculate a range of possible impacts of publishing each piece of content. Results are aggregated for all enrolled ASINs in an experiment. We provide a few kinds of results:
To project one-year impact, we calculate the average daily sales increase of the winning content and multiply by 365. This is an estimate which does not take into account seasonality, price changes or other factors that would affect your business in the real world; it is provided for informational purposes only and we cannot guarantee any incremental benefits.
The Likely column shows the median (the 50th percentile) of the range of possible outcomes we calculated. The Best Case and Worse Case columns show the 95% confidence interval of those outcomes.
An experiment can end with results that are inconclusive or results that show a low confidence that one version of content is better than another. However, these results can still be valuable.
Here are some reasons why an experiment may have inconclusive results:
Refer to your experiment hypothesis when trying to make sense of inconclusive results. For example, depending on what you changed, an inconclusive result can tell you that a certain type of content is not worth investing in because it does not affect customer behaviour. Or, it can tell you that two ways of merchandising your product are similarly effective. You can run additional experiments to confirm what you have learned from your earlier tests.
These notes on experiment methodology may help you understand how we choose an experiment winner and project impact; however, this is not required to run an experiment.
Experiments are based on individual customer accounts. During an experiment, each customer account that sees your content is considered part of the experiment. Customers are randomly assigned to view one version of content will see that content persistently for the duration of the experiment regardless of device type or other factors, as long as the customer can be identified. Visits to your page where a customer cannot be identified are not included in the sample size. We may automatically remove certain types of data from the sample to improve the accuracy of results, such as statistical outliers.
We use a Bayesian approach to analyse experiment results. This means we construct a probability distribution based on a model as well as the actual results of the experiment. We report the mean effect size (in terms of change in units) as well as the 95% confidence interval (also known as a credible interval) of the posterior probability distribution, which is updated weekly during the experiment based on all experiment data collected since the start. The confidence of a winning treatment is the percentage of outcomes in the probability distribution that show a positive unit sales impact.
To project one year impact, we compute the average difference between the winning and losing treatment sales per day for the duration of the experiment so far and multiply by 365. We provide a 95% confidence interval for the impact which is based on the posterior probability distribution.