Metadata-Version: 2.1
Name: significance-analysis
Version: 0.1.1
Summary: Significance Analysis for HPO-algorithms performing on multiple benchmarks
License: MIT
Keywords: Hyperparameter Optimization,AutoML
Author: Anton Merlin Geburek
Author-email: gebureka@cs.uni-freiburg.de
Requires-Python: >=3.8,<=3.11
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: pymer4 (>=0.8.0,<0.9.0)
Description-Content-Type: text/markdown

# Significance Analysis

[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/0.1.0/)
[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/0.1.0/)
[![License](https://img.shields.io/pypi/l/significance-analysisrch?color=informational)](LICENSE)

This package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks.

## Note

As indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.

## Documentation

Please have a look at our [example](sign_analysis_example/example_analysis.py).
The dataset should have the following format:

| system_id<br>(algorithm name) | input_id<br>(benchmark name) | metric<br>(mean/estimate) | optional: bin_id<br>(budget/traininground) |
| ----------------------------- | ---------------------------- | ------------------------- | ------------------------------------------ |
| Algorithm1                    | Benchmark1                   | x.xxx                     | 1                                          |
| Algorithm1                    | Benchmark1                   | x.xxx                     | 2                                          |
| Algorithm1                    | Benchmark2                   | x.xxx                     | 1                                          |
| ...                           | ...                          | ...                       | ...                                        |
| Algorithm2                    | Benchmark2                   | x..xxx                    | 2                                          |

In this dataset, there are two different algorithms, trained on two benchmarks for two iterations each. The variable-names (system_id, input_id...) can be customized, but have to be consistent throughout the dataset, i.e. not "mean" for one benchmark and "estimate" for another. The `Significance Analysis` function is then called with the dataset and the variable-names as parameters.
Optionally the dataset can be binned according to a fourth variable (bin_id) and the analysis is conducted on each of the bins seperately, as shown in the code example above. To do this, provide the name of the bin_id-variable, the bin intervals and the labels for thems.

## Installation

Using pip

```bash
pip install significance-analysis
```

Using R, >=4.0.0
install packages: Matrix, emmeans, lmerTest

## Usage

1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.
1. Call function `checkSignificance` on dataset, while specifying variable-names

In code, the usage pattern can look like this:

```python
from signficance_analysis import checkSignificance

# 1. Generate/import dataset
data = pd.read_pickle("./exampleDataset.pkl")

# 2. Analyse dataset
checkSignificance(data, "mean", "surrogate_aquisition", "benchark")
```

For more details and features please have a look at our [example](sign_analysis_example/example_analysis.py).

