Metadata-Version: 2.1
Name: ScandEval
Version: 12.3.0
Summary: Evaluation of pretrained language models on mono- or multilingual language tasks.
Home-page: https://scandeval.github.io
License: MIT
Author: Dan Saattrup Nielsen
Author-email: dan.nielsen@alexandra.dk
Maintainer: Dan Saattrup Nielsen
Maintainer-email: dan.nielsen@alexandra.dk
Requires-Python: >=3.10,<3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: all
Provides-Extra: generative
Provides-Extra: jax
Provides-Extra: olmo
Provides-Extra: openai
Requires-Dist: accelerate (>=0.26.0,<0.27.0)
Requires-Dist: ai2-olmo (>=0.2.4,<0.3.0) ; extra == "olmo" or extra == "all"
Requires-Dist: bert-score (>=0.3.13,<0.4.0) ; extra == "generative" or extra == "all" or extra == "all"
Requires-Dist: bitsandbytes (>=0.42.0,<0.43.0) ; (sys_platform != "darwin" or platform_machine != "arm64") and (extra == "generative" or extra == "all")
Requires-Dist: boto3 (>=1.34.0,<2.0.0) ; extra == "olmo" or extra == "all"
Requires-Dist: click (>=8.1.3,<9.0.0)
Requires-Dist: datasets (>=2.15.0,<3.0.0)
Requires-Dist: demjson3 (>=3.0.6,<4.0.0) ; extra == "generative" or extra == "all"
Requires-Dist: evaluate (>=0.4.1,<0.5.0)
Requires-Dist: flax (>=0.8.1,<0.9.0) ; extra == "jax" or extra == "all"
Requires-Dist: huggingface-hub (>=0.19.0,<0.20.0) ; extra == "olmo" or extra == "all"
Requires-Dist: jax (>=0.4.24,<0.5.0) ; extra == "jax" or extra == "all"
Requires-Dist: jaxlib (>=0.4.24,<0.5.0) ; extra == "jax" or extra == "all"
Requires-Dist: levenshtein (>=0.24.0,<0.25.0) ; extra == "openai" or extra == "all"
Requires-Dist: numpy (>=1.23.0,<2.0.0)
Requires-Dist: openai (>=1.11.1,<2.0.0) ; extra == "openai" or extra == "all"
Requires-Dist: outlines (>=0.0.36,<0.0.37) ; extra == "generative" or extra == "all"
Requires-Dist: pandas (>=2.2.0,<3.0.0)
Requires-Dist: protobuf (>=3.20.0,<3.21.0)
Requires-Dist: pydantic (>=2.6.0,<3.0.0)
Requires-Dist: pyinfer (>=0.0.3,<0.0.4)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: rouge-score (>=0.1.2,<0.2.0) ; extra == "generative" or extra == "all" or extra == "all"
Requires-Dist: sacremoses (>=0.1.1,<0.2.0)
Requires-Dist: sentencepiece (>=0.1.96,<0.2.0)
Requires-Dist: seqeval (>=1.2.2,<2.0.0)
Requires-Dist: termcolor (>=2.0.0,<3.0.0)
Requires-Dist: tiktoken (>=0.5.2,<0.6.0) ; extra == "openai" or extra == "all"
Requires-Dist: torch (>=2.1.1,<3.0.0)
Requires-Dist: transformers (>=4.38.1,<4.39.0)
Requires-Dist: vllm (>=0.3.3,<0.4.0) ; (sys_platform != "darwin") and (extra == "generative" or extra == "all")
Project-URL: Repository, https://github.com/ScandEval/ScandEval
Description-Content-Type: text/markdown

<div align='center'>
<img src="https://raw.githubusercontent.com/ScandEval/ScandEval/main/gfx/scandeval.png" width="517" height="217">
</div>

### Evaluation of pretrained language models on mono- or multilingual language tasks.

______________________________________________________________________
[![PyPI Status](https://badge.fury.io/py/scandeval.svg)](https://pypi.org/project/scandeval/)
[![Paper](https://img.shields.io/badge/arXiv-2304.00906-b31b1b.svg)](https://arxiv.org/abs/2304.00906)
[![License](https://img.shields.io/github/license/ScandEval/ScandEval)](https://github.com/ScandEval/ScandEval/blob/main/LICENSE)
[![LastCommit](https://img.shields.io/github/last-commit/ScandEval/ScandEval)](https://github.com/ScandEval/ScandEval/commits/main)
[![Code Coverage](https://img.shields.io/badge/Coverage-73%25-yellow.svg)](https://github.com/ScandEval/ScandEval/tree/main/tests)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](https://github.com/ScandEval/ScandEval/blob/main/CODE_OF_CONDUCT.md)


## Maintainers

- Dan Saattrup Nielsen (@saattrupdan, dan.nielsen@alexandra.dk)
- Kenneth Enevoldsen (@KennethEnevoldsen, kenneth.enevoldsen@cas.au.dk)


## Installation
To install the package simply write the following command in your favorite terminal:
```
$ pip install scandeval[all]
```

This will install the ScandEval package with all extras. You can also install the
minimal version by leaving out the `[all]`, in which case the package will let you know
when an evaluation requires a certain extra dependency, and how you install it.

## Quickstart
### Benchmarking from the Command Line
The easiest way to benchmark pretrained models is via the command line interface. After
having installed the package, you can benchmark your favorite model like so:
```
$ scandeval --model <model-id>
```

Here `model` is the HuggingFace model ID, which can be found on the [HuggingFace
Hub](https://huggingface.co/models). By default this will benchmark the model on all
the tasks available. If you want to benchmark on a particular task, then use the
`--task` argument:
```
$ scandeval --model <model-id> --task sentiment-classification
```

We can also narrow down which languages we would like to benchmark on. This can be done
by setting the `--language` argument. Here we thus benchmark the model on the Danish
sentiment classification task:
```
$ scandeval --model <model-id> --task sentiment-classification --language da
```

Multiple models, datasets and/or languages can be specified by just attaching multiple
arguments. Here is an example with two models:
```
$ scandeval --model <model-id1> --model <model-id2>
```

The specific model version/revision to use can also be added after the suffix '@':
```
$ scandeval --model <model-id>@<commit>
```

This can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

See all the arguments and options available for the `scandeval` command by typing
```
$ scandeval --help
```

### Benchmarking from a Script
In a script, the syntax is similar to the command line interface. You simply initialise
an object of the `Benchmarker` class, and call this benchmark object with your favorite
model:
```
>>> from scandeval import Benchmarker
>>> benchmark = Benchmarker()
>>> benchmark(model="<model>")
```

To benchmark on a specific task and/or language, you simply specify the `task` or
`language` arguments, shown here with same example as above:
```
>>> benchmark(model="<model>", task="sentiment-classification", language="da")
```

If you want to benchmark a subset of all the models on the Hugging Face Hub, you can
simply leave out the `model` argument. In this example, we're benchmarking all Danish
models on the Danish sentiment classification task:
```
>>> benchmark(task="sentiment-classification", language="da")
```


## Citing ScandEval
If you want to cite the framework then feel free to use this:

```
@inproceedings{nielsen2023scandeval,
  author = {Nielsen, Dan Saattrup},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  month = may,
  pages = {185--201},
  title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}},
  year = {2023}
}
```


## Remarks
The image used in the logo has been created by the amazing [Scandinavia and the
World](https://satwcomic.com/) team. Go check them out!

