Metadata-Version: 2.1
Name: xgboost-distribution
Version: 0.2.0
Summary: XGBoost for probabilistic prediction.
Home-page: https://github.com/CDonnerer/xgboost-distribution/
Author: Christian Donnerer
Author-email: christian.donnerer@gmail.com
License: MIT
Project-URL: Documentation, https://xgboost-distribution.readthedocs.io/en/latest/?badge=latest
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Requires-Python: >=3.6
Description-Content-Type: text/x-rst; charset=UTF-8
Provides-Extra: testing
License-File: LICENSE.txt
License-File: AUTHORS.rst

.. image:: https://github.com/CDonnerer/xgboost-distribution/actions/workflows/test.yml/badge.svg?branch=main
  :target: https://github.com/CDonnerer/xgboost-distribution/actions/workflows/test.yml

.. image:: https://coveralls.io/repos/github/CDonnerer/xgboost-distribution/badge.svg?branch=main
  :target: https://coveralls.io/github/CDonnerer/xgboost-distribution?branch=main

.. image:: https://readthedocs.org/projects/xgboost-distribution/badge/?version=latest
  :target: https://xgboost-distribution.readthedocs.io/en/latest/?badge=latest
  :alt: Documentation Status

.. image:: https://img.shields.io/pypi/v/xgboost-distribution.svg
  :alt: PyPI-Server
  :target: https://pypi.org/project/xgboost-distribution/


====================
xgboost-distribution
====================

XGBoost for probabilistic prediction. Like `NGBoost`_, but `faster`_, and in the `XGBoost scikit-learn API`_.

.. image:: https://raw.githubusercontent.com/CDonnerer/xgboost-distribution/main/imgs/xgb_dist.png
    :align: center
    :width: 600px
    :alt: XGBDistribution example


Installation
============

.. code-block:: console

    $ pip install xgboost-distribution


Usage
===========

``XGBDistribution`` follows the `XGBoost scikit-learn API`_, with an additional keyword
argument specifying the distribution (see the `documentation`_ for a full list of
available distributions):

.. code-block:: python

      from sklearn.datasets import load_boston
      from sklearn.model_selection import train_test_split

      from xgboost_distribution import XGBDistribution


      data = load_boston()
      X, y = data.data, data.target
      X_train, X_test, y_train, y_test = train_test_split(X, y)

      model = XGBDistribution(distribution="normal", n_estimators=500)
      model.fit(
          X_train, y_train,
          eval_set=[(X_test, y_test)],
          early_stopping_rounds=10
      )

After fitting, we can predict the parameters of the distribution:

.. code-block:: python

      preds = model.predict(X_test)
      mean, std = preds.loc, preds.scale


Note that this returned a `namedtuple`_ of `numpy arrays`_ for each parameter of the
distribution (we use the `scipy stats`_ naming conventions for the parameters, see e.g.
`scipy.stats.norm`_ for the normal distribution).


NGBoost performance comparison
===============================

``XGBDistribution`` follows the method shown in the `NGBoost`_ library, using natural
gradients to estimate the parameters of the distribution.

Below, we show a performance comparison of ``XGBDistribution`` with the `NGBoost`_
``NGBRegressor``, using the Boston Housing dataset, estimating normal distributions.
We note that while the performance of the two models is essentially identical (measured
on negative log-likelihood of a normal distribution and the RMSE), ``XGBDistribution``
is **30x faster** (timed on both fit and predict steps):

.. image:: https://raw.githubusercontent.com/CDonnerer/xgboost-distribution/main/imgs/performance_comparison.png
          :align: center
          :width: 600px
          :alt: XGBDistribution vs NGBoost


Please see the `experiments page`_ in the documentation for detailed results across
various datasets.


Full XGBoost features
======================

``XGBDistribution`` offers the full set of XGBoost features available in the
`XGBoost scikit-learn API`_, allowing, for example, probabilistic regression
with `monotonic constraints`_:

.. image:: https://raw.githubusercontent.com/CDonnerer/xgboost-distribution/main/imgs/monotone_constraint.png
          :align: center
          :width: 600px
          :alt: XGBDistribution monotonic constraints


Acknowledgements
=================

This package would not exist without the excellent work from:

- `NGBoost`_ - Which demonstrated how gradient boosting with natural gradients
  can be used to estimate parameters of distributions. Much of the gradient
  calculations code were adapted from there.

- `XGBoost`_ - Which provides the gradient boosting algorithms used here, in
  particular the ``sklearn`` APIs were taken as a blue-print.


.. _pyscaffold-notes:

Note
====

This project has been set up using PyScaffold 4.0.1. For details and usage
information on PyScaffold see https://pyscaffold.org/.


.. _ngboost: https://github.com/stanfordmlgroup/ngboost
.. _faster:  https://xgboost-distribution.readthedocs.io/en/latest/experiments.html
.. _xgboost scikit-learn api: https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn
.. _monotonic constraints: https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html
.. _scipy.stats.norm: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
.. _LAPACK gesv: https://www.netlib.org/lapack/lug/node71.html
.. _xgboost: https://github.com/dmlc/xgboost
.. _documentation: https://xgboost-distribution.readthedocs.io/en/latest/api/xgboost_distribution.XGBDistribution.html#xgboost_distribution.XGBDistribution
.. _experiments page: https://xgboost-distribution.readthedocs.io/en/latest/experiments.html
.. _numpy arrays: https://numpy.org/doc/stable/reference/generated/numpy.array.html
.. _scipy stats: https://docs.scipy.org/doc/scipy/reference/stats.html
.. _namedtuple: https://docs.python.org/3/library/collections.html#collections.namedtuple


