# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['upliftml', 'upliftml.models']

package_data = \
{'': ['*']}

install_requires = \
['matplotlib>=3.4.0,<4.0.0',
 'numpy>=1.20.1,<2.0.0',
 'pandas>=1.2,<2.0',
 'pydantic>=1.8,<2.0',
 'scikit-learn>=0.24,<0.25',
 'seaborn>=0.11.1,<0.12.0']

setup_kwargs = {
    'name': 'upliftml',
    'version': '0.0.1',
    'description': 'A Python package for uplift modeling with PySpark and H2O',
    'long_description': "# UpliftML: A Python Package for Scalable Uplift Modeling\n**UpliftML** is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.\n\n**Uplift modeling** is a family of techniques for estimating the Conditional Average Treatment Effect (CATE) from experimental or observational data using machine learning. In particular, we are interested in estimating the causal effect of a treatment T on the outcome Y of an individual characterized by features X. In experimental data with binary treatments and binary outcomes, this is equivalent to estimating Pr(Y=1 | T=1, X=x) - Pr(Y=1 | T=0, X=x).\n\nIn many practical use cases the goal is to select which users to target in order to maximize the overall uplift without exceeding a specified **budget or ROI constraint**. In those cases, estimating uplift alone is not sufficient to make optimal decisions and we need to take into account the costs and monetary benefit incurred by the treatment.\n\nUplift modeling is an emerging tool for various personalization applications. Example use cases include marketing campaigns personalization and optimization, personalized pricing in e-commerce, and clinical treatment personalization.\n\nThe **UpliftML** library includes PySpark/H2O implementations for the following:\n- 6 metalearner approaches for uplift modeling: T-learner[1], S-learner[1], X-learner[1], R-learner[2], class variable transformation[3], transformed outcome approach[4].\n- The Retrospective Estimation[5] technique for uplift modeling under ROI constraints.\n- Uplift and iROI-based evaluation and plotting functions with bootstrapped confidence intervals. Currently implemented: ATE, ROI, iROI, CATE per category/quantile, CATE lift, Qini/AUUC curves[6], Qini/AUUC score[6], cumulative iROI curves.\n\nFor detailed information about the package, read the [UpliftML documentation](https://upliftml.readthedocs.io/).\n\n# Installation\nInstall the latest release from PyPI:\n\n```\n$ pip install upliftml\n```\n\n# Quick Start\n\n```python\nfrom upliftml.models.pyspark import TLearnerEstimator\nfrom upliftml.evaluation import estimate_and_plot_qini\nfrom upliftml.datasets import simulate_randomized_trial\nfrom pyspark.ml.classification import LogisticRegression\n\n\n# Read/generate the dataset and convert it to Spark if needed\ndf_pd = simulate_randomized_trial(n=2000, p=6, sigma=1.0, binary_outcome=True)\ndf_spark = spark.createDataFrame(df_pd)\n\n# Split the data into train, validation, and test sets\ndf_train, df_val, df_test = df_spark.randomSplit([0.5, 0.25, 0.25])\n\n# Preprocess the datasets (for implementation of get_features_vector, see the full example notebook)\nnum_features = [col for col in df_spark.columns if col.startswith('feature')]\ncat_features = []\ndf_train_assembled = get_features_vector(df_train, num_features, cat_features)\ndf_val_assembled = get_features_vector(df_val, num_features, cat_features)\ndf_test_assembled = get_features_vector(df_test, num_features, cat_features)\n\n# Build a two-model estimator\nmodel = TLearnerEstimator(base_model_class=LogisticRegression,\n                          base_model_params={'maxIter': 15},\n                          predictors_colname='features',\n                          target_colname='outcome',\n                          treatment_colname='treatment',\n                          treatment_value=1,\n                          control_value=0)\nmodel.fit(df_train_assembled, df_val_assembled)\n\n# Apply the model to test data\ndf_test_eval = model.predict(df_test_assembled)\n\n# Evaluate performance on the test set\nqini_values, ax = estimate_and_plot_qini(df_test_eval)\n```\n\nFor complete examples with more estimators and evaluation functions, see the demo notebooks in the ``examples`` folder.\n\n# Contributing\nIf interested in contributing to the package, get started by reading our [contributor guidelines](CONTRIBUTING.md).\n\n# License\nThe project is licensed under [Apache 2.0 License](https://github.com/bookingcom/upliftml/blob/main/LICENSE)\n\n# Citation\nIf you use UpliftML, please cite it as follows:\n\nIrene Teinemaa, Javier Albert, Nam Pham. **UpliftML: A Python Package for Scalable Uplift Modeling.** https://github.com/bookingcom/upliftml, 2021. Version 0.0.1.\n\n```\n@misc{upliftml,\n  author={Irene Teinemaa, Javier Albert, Nam Pham},\n  title={{UpliftML}: {A Python Package for Scalable Uplift Modeling}},\n  howpublished={https://github.com/bookingcom/upliftml},\n  note={Version 0.0.1},\n  year={2021}\n}\n```\n\n\n# Resources\nDocumentation:\n* [UpliftML documentation](https://upliftml.readthedocs.io/)\n\nTutorials and blog posts:\n* [Retrospective Estimation (blog post)](https://booking.ai/free-lunch-40a963e12b0a)\n* [Uplift modeling tutorial at WebConf'2021](https://booking.ai/uplift-modeling-f9759e3fb51e)\n* [Personalization in Practice tutorial at WSDM'2021](https://booking.ai/personalization-in-practice-2bb4bc680eb3)\n\nRelated packages:\n* [CausalML](https://github.com/uber/causalml): a Python package for uplift modeling and causal inference with machine learning\n* [EconML](https://github.com/microsoft/EconML): a Python package for estimating heterogeneous treatment effects from observational data via machine learning\n\n# References\n\n1. Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 2019.\n2. Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. arXiv preprint arXiv:1712.04912, 2017.\n3. Maciej Jaskowski and Szymon Jaroszewicz. Uplift modeling for clinical trial data. ICML Workshop on Clinical Data Analysis, 2012.\n4. Susan Athey and Guido W. Imbens. Machine learning methods for estimating heterogeneous causal effects. stat, 1050(5), 2015.\n5. Dmitri Goldenberg, Javier Albert, Lucas Bernardi, Pablo Estevez Castillo. Free Lunch! Retrospective Uplift Modeling for Dynamic Promotions Recommendation within ROI Constraints. In Fourteenth ACM Conference on Recommender Systems (pp. 486-491), 2020.\n6. Nicholas J Radcliffe and Patrick D Surry. Real-world uplift modelling with significance based uplift trees. White Paper tr-2011-1, Stochastic Solutions, 2011.\n",
    'author': 'Javier Albert',
    'author_email': 'javier.albert@booking.com',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/bookingcom/upliftml',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.7.1,<4.0.0',
}


setup(**setup_kwargs)
