# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['bayesian_testing',
 'bayesian_testing.experiments',
 'bayesian_testing.metrics',
 'bayesian_testing.utilities']

package_data = \
{'': ['*']}

install_requires = \
['numpy>=1.19.0,<2.0.0']

setup_kwargs = {
    'name': 'bayesian-testing',
    'version': '0.2.2',
    'description': 'Bayesian A/B testing with simple probabilities.',
    'long_description': '[![Tests](https://github.com/Matt52/bayesian-testing/workflows/Tests/badge.svg)](https://github.com/Matt52/bayesian-testing/actions?workflow=Tests)\n[![Codecov](https://codecov.io/gh/Matt52/bayesian-testing/branch/main/graph/badge.svg)](https://codecov.io/gh/Matt52/bayesian-testing)\n[![PyPI](https://img.shields.io/pypi/v/bayesian-testing.svg)](https://pypi.org/project/bayesian-testing/)\n# Bayesian A/B testing\n`bayesian_testing` is a small package for a quick evaluation of A/B (or A/B/C/...) tests using Bayesian approach.\n\nThe package currently supports these data inputs:\n- **binary data** (`[0, 1, 0, ...]`) - convenient for conversion-like A/B testing\n- **normal data** with unknown variance - convenient for normal data A/B testing\n- **delta-lognormal data** (lognormal data with zeros) - convenient for revenue-like A/B testing\n- **discrete data** (categorical data with numerical categories) - convenient for discrete data A/B testing\n(e.g. dice rolls, star ratings, 1-10 ratings)\n\nThe core evaluation metric of the approach is `Probability of Being Best`\n(i.e. "being larger" from data point of view)\nwhich is calculated using simulations from posterior distributions (considering given data).\n\n\n## Installation\n`bayesian_testing` can be installed using pip:\n```console\npip install bayesian_testing\n```\nAlternatively, you can clone the repository and use `poetry` manually:\n```console\ncd bayesian_testing\npip install poetry\npoetry install\npoetry shell\n```\n\n## Basic Usage\nThe primary features are classes:\n- `BinaryDataTest`\n- `NormalDataTest`\n- `DeltaLognormalDataTest`\n- `DiscreteDataTest`\n\nIn all cases, there are two methods to insert data:\n- `add_variant_data` - adding raw data for a variant as a list of numbers (or numpy 1-D array)\n- `add_variant_data_agg` - adding aggregated variant data (this can be practical for large data, as the\naggregation can be done on a database level)\n\nBoth methods for adding data are allowing specification of prior distribution using default parameters\n(see details in respective docstrings). Default prior setup should be sufficient for most of the cases\n(e.g. in cases with unknown priors or large amounts of data).\n\nTo get the results of the test, simply call method `evaluate`, or `probabs_of_being_best`\nfor returning just the probabilities.\n\nProbabilities of being best are approximated using simulations, hence `evaluate` can return slightly different\nvalues for different runs. To stabilize it, you can set `sim_count` parameter of `evaluate` to higher value\n(default value is 20K), or even use `seed` parameter to fix it completely.\n\n\n### BinaryDataTest\nClass for Bayesian A/B test for binary-like data (e.g. conversions, successes, etc.).\n\n```python\nimport numpy as np\nfrom bayesian_testing.experiments import BinaryDataTest\n\n# generating some random data\nrng = np.random.default_rng(52)\n# random 1x1500 array of 0/1 data with 5.2% probability for 1:\ndata_a = rng.binomial(n=1, p=0.052, size=1500)\n# random 1x1200 array of 0/1 data with 6.7% probability for 1:\ndata_b = rng.binomial(n=1, p=0.067, size=1200)\n\n# initialize a test\ntest = BinaryDataTest()\n\n# add variant using raw data (arrays of zeros and ones):\ntest.add_variant_data("A", data_a)\ntest.add_variant_data("B", data_b)\n# priors can be specified like this (default for this test is a=b=1/2):\n# test.add_variant_data("B", data_b, a_prior=1, b_prior=20)\n\n# add variant using aggregated data (same as raw data with 950 zeros and 50 ones):\ntest.add_variant_data_agg("C", totals=1000, positives=50)\n\n# evaluate test\ntest.evaluate()\n```\n\n    [{\'variant\': \'A\',\n      \'totals\': 1500,\n      \'positives\': 80,\n      \'conv_rate\': 0.05333,\n      \'prob_being_best\': 0.06625},\n     {\'variant\': \'B\',\n      \'totals\': 1200,\n      \'positives\': 80,\n      \'conv_rate\': 0.06667,\n      \'prob_being_best\': 0.89005},\n     {\'variant\': \'C\',\n      \'totals\': 1000,\n      \'positives\': 50,\n      \'conv_rate\': 0.05,\n      \'prob_being_best\': 0.0437}]\n\n### NormalDataTest\nClass for Bayesian A/B test for normal data.\n\n```python\nimport numpy as np\nfrom bayesian_testing.experiments import NormalDataTest\n\n# generating some random data\nrng = np.random.default_rng(21)\ndata_a = rng.normal(7.2, 2, 1000)\ndata_b = rng.normal(7.1, 2, 800)\ndata_c = rng.normal(7.0, 4, 500)\n\n# initialize a test\ntest = NormalDataTest()\n\n# add variant using raw data:\ntest.add_variant_data("A", data_a)\ntest.add_variant_data("B", data_b)\n# test.add_variant_data("C", data_c)\n\n# add variant using aggregated data:\ntest.add_variant_data_agg("C", len(data_c), sum(data_c), sum(np.square(data_c)))\n\n# evaluate test\ntest.evaluate(sim_count=20000, seed=52)\n```\n\n    [{\'variant\': \'A\',\n      \'totals\': 1000,\n      \'sum_values\': 7294.67901,\n      \'avg_values\': 7.29468,\n      \'prob_being_best\': 0.1707},\n     {\'variant\': \'B\',\n      \'totals\': 800,\n      \'sum_values\': 5685.86168,\n      \'avg_values\': 7.10733,\n      \'prob_being_best\': 0.00125},\n     {\'variant\': \'C\',\n      \'totals\': 500,\n      \'sum_values\': 3736.91581,\n      \'avg_values\': 7.47383,\n      \'prob_being_best\': 0.82805}]\n\n### DeltaLognormalDataTest\nClass for Bayesian A/B test for delta-lognormal data (log-normal with zeros).\nDelta-lognormal data is typical case of revenue per session data where many sessions have 0 revenue\nbut non-zero values are positive numbers with possible log-normal distribution.\nTo handle this data, the calculation is combining binary Bayes model for zero vs non-zero\n"conversions" and log-normal model for non-zero values.\n\n```python\nimport numpy as np\nfrom bayesian_testing.experiments import DeltaLognormalDataTest\n\ntest = DeltaLognormalDataTest()\n\ndata_a = [7.1, 0.3, 5.9, 0, 1.3, 0.3, 0, 0, 0, 0, 0, 1.5, 2.2, 0, 4.9, 0, 0, 0, 0, 0]\ndata_b = [4.0, 0, 3.3, 19.3, 18.5, 0, 0, 0, 12.9, 0, 0, 0, 0, 0, 0, 0, 0, 3.7, 0, 0]\n\n# adding variant using raw data\ntest.add_variant_data("A", data_a)\n\n# alternatively, variant can be also added using aggregated data:\ntest.add_variant_data_agg(\n    name="B",\n    totals=len(data_b),\n    positives=sum(x > 0 for x in data_b),\n    sum_values=sum(data_b),\n    sum_logs=sum([np.log(x) for x in data_b if x > 0]),\n    sum_logs_2=sum([np.square(np.log(x)) for x in data_b if x > 0])\n)\n\ntest.evaluate(seed=21)\n```\n\n    [{\'variant\': \'A\',\n      \'totals\': 20,\n      \'positives\': 8,\n      \'sum_values\': 23.5,\n      \'avg_values\': 1.175,\n      \'avg_positive_values\': 2.9375,\n      \'prob_being_best\': 0.18915},\n     {\'variant\': \'B\',\n      \'totals\': 20,\n      \'positives\': 6,\n      \'sum_values\': 61.7,\n      \'avg_values\': 3.085,\n      \'avg_positive_values\': 10.28333,\n      \'prob_being_best\': 0.81085}]\n\n### DiscreteDataTest\nClass for Bayesian A/B test for discrete data with finite number of numerical categories (states),\nrepresenting some value.\nThis test can be used for instance for dice rolls data (when looking for the "best" of multiple dice) or rating data\n(e.g. 1-5 stars or 1-10 scale).\n\n```python\nimport numpy as np\nfrom bayesian_testing.experiments import DiscreteDataTest\n\n# dice rolls data for 3 dice - A, B, C\ndata_a = [2, 5, 1, 4, 6, 2, 2, 6, 3, 2, 6, 3, 4, 6, 3, 1, 6, 3, 5, 6]\ndata_b = [1, 2, 2, 2, 2, 3, 2, 3, 4, 2]\ndata_c = [1, 3, 6, 5, 4]\n\n# initialize a test with all possible states (i.e. numerical categories):\ntest = DiscreteDataTest(states=[1, 2, 3, 4, 5, 6])\n\n# add variant using raw data:\ntest.add_variant_data("A", data_a)\ntest.add_variant_data("B", data_b)\ntest.add_variant_data("C", data_c)\n\n# add variant using aggregated data:\n# test.add_variant_data_agg("C", [1, 0, 1, 1, 1, 1]) # equivalent to rolls data_c\n\n# evaluate test\ntest.evaluate(sim_count=20000, seed=52)\n```\n\n    [{\'variant\': \'A\',\n      \'concentration\': {1: 2.0, 2: 4.0, 3: 4.0, 4: 2.0, 5: 2.0, 6: 6.0},\n      \'average_value\': 3.8,\n      \'prob_being_best\': 0.54685},\n     {\'variant\': \'B\',\n      \'concentration\': {1: 1.0, 2: 6.0, 3: 2.0, 4: 1.0, 5: 0.0, 6: 0.0},\n      \'average_value\': 2.3,\n      \'prob_being_best\': 0.008},\n     {\'variant\': \'C\',\n      \'concentration\': {1: 1.0, 2: 0.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0},\n      \'average_value\': 3.8,\n      \'prob_being_best\': 0.44515}]\n\n## Development\nTo set up development environment use [Poetry](https://python-poetry.org/) and [pre-commit](https://pre-commit.com):\n```console\npip install poetry\npoetry install\npoetry run pre-commit install\n```\n\n## Roadmap\n\nTest classes to be added:\n- `PoissonDataTest`\n- `ExponentialDataTest`\n\nMetrics to be added:\n- `Expected Loss`\n- `Potential Value Remaining`\n\n## References\n- `bayesian_testing` package itself is dependent only on [numpy](https://numpy.org) package.\n- Work on this package (including default priors selection) was inspired mainly by Coursera\ncourse [Bayesian Statistics: From Concept to Data Analysis](https://www.coursera.org/learn/bayesian-statistics).\n',
    'author': 'Matus Baniar',
    'author_email': None,
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/Matt52/bayesian-testing',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.7.1,<4.0.0',
}


setup(**setup_kwargs)
