Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis

A/B testing is used in digital contexts both to offer a more personalized service and to optimize the e-commerce purchasing process. A personalized service provides customers with the fastest possible access to the contents that they are most likely to use. An optimized e-commerce purchasing process...

Full description

Bibliographic Details
Main Authors: Miguel Martín, Antonio Jiménez-Martín, Alfonso Mateos, Josefa Z. Hernández
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/13/11/2175
_version_ 1797508391644430336
author Miguel Martín
Antonio Jiménez-Martín
Alfonso Mateos
Josefa Z. Hernández
author_facet Miguel Martín
Antonio Jiménez-Martín
Alfonso Mateos
Josefa Z. Hernández
author_sort Miguel Martín
collection DOAJ
description A/B testing is used in digital contexts both to offer a more personalized service and to optimize the e-commerce purchasing process. A personalized service provides customers with the fastest possible access to the contents that they are most likely to use. An optimized e-commerce purchasing process reduces customer effort during online purchasing and assures that the largest possible number of customers place their order. The most widespread A/B testing method is to implement the equivalent of RCT (randomized controlled trials). Recently, however, some companies and solutions have addressed this experimentation process as a multi-armed bandit (MAB). This is known in the A/B testing market as dynamic traffic distribution. A complementary technique used to optimize the performance of A/B testing is to improve the experiment stopping criterion. In this paper, we propose an adaptation of A/B testing to account for possibilistic reward (PR) methods, together with the definition of a new stopping criterion also based on PR methods to be used for both classical A/B testing and A/B testing based on MAB algorithms. A comparative numerical analysis based on the simulation of real scenarios is used to analyze the performance of the proposed adaptations in both Bernoulli and non-Bernoulli environments. In this analysis, we show that the possibilistic reward method PR3 produced the lowest mean cumulative regret in non-Bernoulli environments, which proved to have a high confidence level and be highly stable as demonstrated by low standard deviation measures. PR3 behaves exactly the same as Thompson sampling in Bernoulli environments. The conclusion is that PR3 can be used efficiently in both environments in combination with the value remaining stopping criterion in Bernoulli environments and the PR3 bounds stopping criterion for non-Bernoulli environments.
first_indexed 2024-03-10T05:01:27Z
format Article
id doaj.art-b08d1030614046c292f5add6de5aed6a
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T05:01:27Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-b08d1030614046c292f5add6de5aed6a2023-11-23T01:46:22ZengMDPI AGSymmetry2073-89942021-11-011311217510.3390/sym13112175Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical AnalysisMiguel Martín0Antonio Jiménez-Martín1Alfonso Mateos2Josefa Z. Hernández3Decision Analysis and Statistics Group, E.T.S.I. Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, SpainDecision Analysis and Statistics Group, E.T.S.I. Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, SpainDecision Analysis and Statistics Group, E.T.S.I. Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, SpainDecision Analysis and Statistics Group, E.T.S.I. Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, SpainA/B testing is used in digital contexts both to offer a more personalized service and to optimize the e-commerce purchasing process. A personalized service provides customers with the fastest possible access to the contents that they are most likely to use. An optimized e-commerce purchasing process reduces customer effort during online purchasing and assures that the largest possible number of customers place their order. The most widespread A/B testing method is to implement the equivalent of RCT (randomized controlled trials). Recently, however, some companies and solutions have addressed this experimentation process as a multi-armed bandit (MAB). This is known in the A/B testing market as dynamic traffic distribution. A complementary technique used to optimize the performance of A/B testing is to improve the experiment stopping criterion. In this paper, we propose an adaptation of A/B testing to account for possibilistic reward (PR) methods, together with the definition of a new stopping criterion also based on PR methods to be used for both classical A/B testing and A/B testing based on MAB algorithms. A comparative numerical analysis based on the simulation of real scenarios is used to analyze the performance of the proposed adaptations in both Bernoulli and non-Bernoulli environments. In this analysis, we show that the possibilistic reward method PR3 produced the lowest mean cumulative regret in non-Bernoulli environments, which proved to have a high confidence level and be highly stable as demonstrated by low standard deviation measures. PR3 behaves exactly the same as Thompson sampling in Bernoulli environments. The conclusion is that PR3 can be used efficiently in both environments in combination with the value remaining stopping criterion in Bernoulli environments and the PR3 bounds stopping criterion for non-Bernoulli environments.https://www.mdpi.com/2073-8994/13/11/2175A/B testingmulti-armed banditstopping criterionnumerical analyses
spellingShingle Miguel Martín
Antonio Jiménez-Martín
Alfonso Mateos
Josefa Z. Hernández
Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis
Symmetry
A/B testing
multi-armed bandit
stopping criterion
numerical analyses
title Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis
title_full Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis
title_fullStr Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis
title_full_unstemmed Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis
title_short Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis
title_sort improving a b testing on the basis of possibilistic reward methods a numerical analysis
topic A/B testing
multi-armed bandit
stopping criterion
numerical analyses
url https://www.mdpi.com/2073-8994/13/11/2175
work_keys_str_mv AT miguelmartin improvingabtestingonthebasisofpossibilisticrewardmethodsanumericalanalysis
AT antoniojimenezmartin improvingabtestingonthebasisofpossibilisticrewardmethodsanumericalanalysis
AT alfonsomateos improvingabtestingonthebasisofpossibilisticrewardmethodsanumericalanalysis
AT josefazhernandez improvingabtestingonthebasisofpossibilisticrewardmethodsanumericalanalysis