Metadata-Version: 2.1
Name: presc
Version: 0.3.0
Summary: Performance Robustness Evaluation for Statistical Classifiers
Home-page: https://github.com/mozilla/PRESC
Author: Mozilla Corporation
Author-email: dzeber@mozilla.com
License: MPL 2.0
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Web Environment :: Mozilla
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# PRESC: Performance and Robustness Evaluation for Statistical Classifiers

[![CircleCI](https://circleci.com/gh/mozilla/PRESC.svg?style=svg)](https://circleci.com/gh/mozilla/PRESC)
[![Join the chat at https://gitter.im/PRESC-outreachy/community](https://badges.gitter.im/PRESC-outreachy/community.svg)](https://gitter.im/PRESC-outreachy/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

PRESC is a toolkit for the evaluation of machine learning classification
models.
Its goal is to provide insights into model performance which extend beyond
standard scalar accuracy-based measures and into areas which tend to be
underexplored in application, including:

- Generalizability of the model to unseen data for which the training set may
  not be representative
- Sensitivity to statistical error and methodological choices
- Performance evaluation localized to meaningful subsets of the feature space
- In-depth analysis of misclassifications and their distribution in the feature
  space

More details about the specific features we are considering are presented in the
[project roadmap](./docs/ROADMAP.md).
We believe that these evaluations are essential for developing confidence in
the selection and tuning of machine learning models intended to address user
needs, and are important prerequisites towards building
[trustworthy AI](https://foundation.mozilla.org/en/internet-health/trustworthy-artificial-intelligence/).

It also includes a package to carry out copies of machine learning classifiers.

As a tool, PRESC is intended for use by ML engineers to assist in the
development and updating of models.
It is usable in the following ways:

- As a standalone tool which produces a graphical report evaluating a given
  model and dataset
- As a Python package/API which can be integrated into an existing pipeline

A further goal is to use PRESC:

- As a step in a Continuous Integration workflow: evaluations run as a part of
  CI, for example, on regular model updates, and fail if metrics produce
  unacceptable values.

For the time being, the following are considered __out of scope__:

- User-facing evaluations, eg. explanations
- Evaluations which depend explicitly on domain context or value judgements of
  features, eg. protected demographic attributes. A domain expert could use
  PRESC to study misclassifications across such protected groups, say, but the
  PRESC evaluations themselves should be agnostic to such determinations.
- Analyses which do not involve the model, eg. class imbalance in the training
  data

There is a considerable body of recent academic research addressing these
topics, as well as a number of open-source projects solving related problems.
Where possible, we plan to offer integration with existing tools which align
with our vision and goals.

