Metadata-Version: 2.4
Name: nonconform
Version: 0.9.162
Summary: Conformal Anomaly Detection
Project-URL: Homepage, https://github.com/OliverHennhoefer/nonconform
Project-URL: Bugs, https://github.com/OliverHennhoefer/nonconform/issues
Author-email: Oliver Hennhoefer <oliver.hennhoefer@mail.de>
Maintainer-email: Oliver Hennhoefer <oliver.hennhoefer@mail.de>
License: BSD 3-Clause License
        
        Copyright (c) 2024, Oliver Hennhöfer
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE
Keywords: anomaly detection,conformal anomaly detection,conformal inference,false discovery rate,uncertainty quantification
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.12
Requires-Dist: numpy<2.3,>=2.2.0
Requires-Dist: pandas>=2.2.1
Requires-Dist: pyod==2.0.5
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: scipy>=1.13.0
Requires-Dist: tqdm>=4.66.2
Provides-Extra: all
Requires-Dist: black; extra == 'all'
Requires-Dist: build; extra == 'all'
Requires-Dist: furo; extra == 'all'
Requires-Dist: myst-parser; extra == 'all'
Requires-Dist: online-fdr>=0.0.3; extra == 'all'
Requires-Dist: pre-commit; extra == 'all'
Requires-Dist: pyarrow>=16.1.0; extra == 'all'
Requires-Dist: ruff; extra == 'all'
Requires-Dist: sphinx; extra == 'all'
Requires-Dist: sphinx-autoapi; extra == 'all'
Requires-Dist: torch>=2.7.0; extra == 'all'
Requires-Dist: twine; extra == 'all'
Provides-Extra: data
Requires-Dist: pyarrow>=16.1.0; extra == 'data'
Provides-Extra: deep
Requires-Dist: torch>=2.7.0; extra == 'deep'
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: build; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Provides-Extra: docs
Requires-Dist: furo; extra == 'docs'
Requires-Dist: myst-parser; extra == 'docs'
Requires-Dist: sphinx; extra == 'docs'
Requires-Dist: sphinx-autoapi; extra == 'docs'
Provides-Extra: fdr
Requires-Dist: online-fdr>=0.0.3; extra == 'fdr'
Description-Content-Type: text/markdown

**nonconform** is a Python library that enhances anomaly detection by providing uncertainty quantification. It acts as a wrapper around most detectors from the popular [*PyOD*](https://pyod.readthedocs.io/en/latest/) library (see [Supported Estimators](#supported-estimators)). By leveraging one-class classification principles and **conformal inference**, **nonconform** enables **statistically rigorous anomaly detection**.

# Key Features

*   **Uncertainty Quantification:** Go beyond simple anomaly scores; get statistically valid _p_-values.
*   **Error Control:** Reliably control metrics like the False Discovery Rate (FDR).
*   **Broad PyOD Compatibility:** Works with a wide range of PyOD estimators (see [Supported Estimators](#supported-estimators)).
*   **Flexible Strategies:** Implements various conformal strategies like Split-Conformal and Bootstrap-after-Jackknife+ (JaB+).

# Getting Started

```sh
pip install nonconform
```

_For additional features, you might need optional dependencies:_
- `pip install nonconform[data]` - Includes pyarrow for loading example data (via remote download)
- `pip install nonconform[deep]` - Includes deep learning dependencies (PyTorch)
- `pip install nonconform[fdr]` - Includes advanced FDR control methods (online-fdr)
- `pip install nonconform[dev]` - Includes development tools (black, ruff, pre-commit)
- `pip install nonconform[docs]` - Includes documentation building tools (sphinx, furo, etc.)
- `pip install nonconform[all]` - Includes all optional dependencies

_Please refer to the [pyproject.toml](https://github.com/OliverHennhoefer/nonconform/blob/main/pyproject.toml) for details._

## Split-Conformal (also _Inductive_) Approach

Using a _Gaussian Mixture Model_ on the _Shuttle_ dataset:

> **Note:** The examples below use the built-in datasets. Install with `pip install nonconform[data]` to run these examples.

```python
from pyod.models.gmm import GMM
from scipy.stats import false_discovery_control

from nonconform.strategy import Split
from nonconform.estimation import StandardConformalDetector
from nonconform.utils.data import load_shuttle
from nonconform.utils.stat import false_discovery_rate, statistical_power

x_train, x_test, y_test = load_shuttle(setup=True)

ce = StandardConformalDetector(
    detector=GMM(),
    strategy=Split(n_calib=1_000)
)

ce.fit(x_train)
estimates = ce.predict(x_test)

decisions = false_discovery_control(estimates, method='bh') <= 0.2

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=decisions)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=decisions)}")
```

Output:
```text
Empirical FDR: 0.108
Empirical Power: 0.99
```

# Advanced Usage

## Bootstrap-after-Jackknife+ (JaB+)

The `BootstrapConformal()` strategy allows to set 2 of the 3 parameters `resampling_ratio`, `n_boostraps` and `n_calib`.
For either combination, the remaining parameter will be filled automatically. This allows exact control of the
calibration procedure when using a bootstrap strategy.

```python
from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control

from nonconform.estimation import StandardConformalDetector
from nonconform.strategy import Bootstrap
from nonconform.utils.data import load_shuttle
from nonconform.utils.stat import false_discovery_rate, statistical_power

x_train, x_test, y_test = load_shuttle(setup=True)

ce = StandardConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=Bootstrap(resampling_ratio=0.99, n_bootstraps=20, plus=True)
)

ce.fit(x_train)
estimates = ce.predict(x_test)

decisions = false_discovery_control(estimates, method='bh') <= 0.1

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=decisions)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=decisions)}")
```

Output:
```text
Empirical FDR: 0.067
Empirical Power: 0.98
```

## Weighted Conformal Anomaly Detection

The statistical validity of conformal anomaly detection depends on data *exchangability* (weaker than i.i.d.). This assumption can be slightly relaxed by computing weighted conformal _p_-values.

```python
from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control

from nonconform.utils.data import load_shuttle
from nonconform.estimation import WeightedConformalDetector
from nonconform.strategy import Split
from nonconform.utils.stat import false_discovery_rate, statistical_power

x_train, x_test, y_test = load_shuttle(setup=True)

model = IForest(behaviour="new")
strategy = Split(n_calib=1_000)

ce = WeightedConformalDetector(detector=model, strategy=strategy)
ce.fit(x_train)
estimates = ce.predict(x_test)

decisions = false_discovery_control(estimates, method='bh') <= 0.1

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=decisions)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=decisions)}")
```

Output:
```text
Empirical FDR: 0.077
Empirical Power: 0.96
```

# Citation

If you find this repository useful for your research, please cite following papers:

##### Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors
```text
@inproceedings{Hennhofer2024,
	title        = {{ Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors }},
	author       = {Hennhofer, Oliver and Preisach, Christine},
	year         = 2024,
	month        = {Dec},
	booktitle    = {2024 IEEE International Conference on Knowledge Graph (ICKG)},
	publisher    = {IEEE Computer Society},
	address      = {Los Alamitos, CA, USA},
	pages        = {110--119},
	doi          = {10.1109/ICKG63256.2024.00022},
	url          = {https://doi.ieeecomputersociety.org/10.1109/ICKG63256.2024.00022}
}
```

##### Testing for outliers with conformal p-values
```text
@article{Bates2023,
	title        = {Testing for outliers with conformal p-values},
	author       = {Bates,  Stephen and Candès,  Emmanuel and Lei,  Lihua and Romano,  Yaniv and Sesia,  Matteo},
	year         = 2023,
	month        = feb,
	journal      = {The Annals of Statistics},
	publisher    = {Institute of Mathematical Statistics},
	volume       = 51,
	number       = 1,
	doi          = {10.1214/22-aos2244},
	issn         = {0090-5364},
	url          = {http://dx.doi.org/10.1214/22-AOS2244}
}
```
##### Model-free selective inference under covariate shift via weighted conformal p-values
```text
@inproceedings{Jin2023,
	title        = {Model-free selective inference under covariate shift via weighted conformal p-values},
	author       = {Ying Jin and Emmanuel J. Cand{\`e}s},
	year         = 2023,
	url          = {https://api.semanticscholar.org/CorpusID:259950903}
}
```

# Supported Estimators

The package only supports anomaly estimators that are suitable for unsupervised one-class classification. As respective
detectors are therefore exclusively fitted on *normal* (or *non-anomalous*) data, parameters like *threshold* are internally
set to the smallest possible values.

Models that are **currently supported** include:

* Angle-Based Outlier Detection (**ABOD**)
* Autoencoder (**AE**)
* Cook's Distance (**CD**)
* Copula-based Outlier Detector (**COPOD**)
* Deep Isolation Forest (**DIF**)
* Empirical-Cumulative-distribution-based Outlier Detection (**ECOD**)
* Gaussian Mixture Model (**GMM**)
* Histogram-based Outlier Detection (**HBOS**)
* Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (**INNE**)
* Isolation Forest (**IForest**)
* Kernel Density Estimation (**KDE**)
* *k*-Nearest Neighbor (***k*NN**)
* Kernel Principal Component Analysis (**KPCA**)
* Linear Model Deviation-base Outlier Detection (**LMDD**)
* Local Outlier Factor (**LOF**)
* Local Correlation Integral (**LOCI**)
* Lightweight Online Detector of Anomalies (**LODA**)
* Locally Selective Combination of Parallel Outlier Ensembles (**LSCP**)
* GNN-based Anomaly Detection Method (**LUNAR**)
* Median Absolute Deviation (**MAD**)
* Minimum Covariance Determinant (**MCD**)
* One-Class SVM (**OCSVM**)
* Principal Component Analysis (**PCA**)
* Quasi-Monte Carlo Discrepancy Outlier Detection (**QMCD**)
* Rotation-based Outlier Detection (**ROD**)
* Subspace Outlier Detection (**SOD**)
* Scalable Unsupervised Outlier Detection (**SUOD**)

# Contact
**Bug reporting:** [https://github.com/OliverHennhoefer/nonconform/issues](https://github.com/OliverHennhoefer/nonconform/issues)
