Metadata-Version: 2.1
Name: openllm
Version: 0.0.12
Summary: OpenLLM: REST/gRPC API server for running any open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, Custom
Project-URL: Documentation, https://github.com/llmsys/openllm#readme
Project-URL: Issues, https://github.com/llmsys/openllm/issues
Project-URL: Source, https://github.com/llmsys/openllm
Author-email: Aaron Pham <aarnphm@bentoml.com>, BentoML Team <contact@bentoml.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: AI,Alpaca,BentoML,Generative AI,LLMOps,Large Language Model,MLOps,Model Deployment,Model Serving,PyTorch,Stable Diffusion,StableLM,Transformers
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.8
Requires-Dist: bentoml
Requires-Dist: black[jupyter]==23.3.0
Requires-Dist: filetype
Requires-Dist: grpcio
Requires-Dist: grpcio-health-checking
Requires-Dist: grpcio-reflection
Requires-Dist: httpx[http2]
Requires-Dist: inflection
Requires-Dist: opentelemetry-instrumentation-grpc==0.35b0
Requires-Dist: optimum
Requires-Dist: orjson
Requires-Dist: pillow
Requires-Dist: protobuf
Requires-Dist: transformers[accelerate,onnx,onnxruntime,tokenizers,torch]>=4.29.0
Provides-Extra: all
Requires-Dist: openllm[chatglm]; extra == 'all'
Requires-Dist: openllm[falcon]; extra == 'all'
Requires-Dist: openllm[fine-tune]; extra == 'all'
Requires-Dist: openllm[flan-t5]; extra == 'all'
Requires-Dist: openllm[starcoder]; extra == 'all'
Provides-Extra: chatglm
Requires-Dist: cpm-kernels; extra == 'chatglm'
Requires-Dist: sentencepiece; extra == 'chatglm'
Provides-Extra: falcon
Requires-Dist: einops; extra == 'falcon'
Provides-Extra: fine-tune
Requires-Dist: bitsandbytes; extra == 'fine-tune'
Requires-Dist: datasets; extra == 'fine-tune'
Requires-Dist: peft; extra == 'fine-tune'
Provides-Extra: flan-t5
Requires-Dist: flax; extra == 'flan-t5'
Requires-Dist: jax; extra == 'flan-t5'
Requires-Dist: jaxlib; extra == 'flan-t5'
Requires-Dist: tensorflow; extra == 'flan-t5'
Provides-Extra: starcoder
Requires-Dist: bitsandbytes; extra == 'starcoder'
Description-Content-Type: text/markdown

<div align="center">
    <h1 align="center">OpenLLM</h1>
    <br>
    <strong>REST/gRPC API server for running any Open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, and more<br></strong>
    <i>Powered by BentoML 🍱 + HuggingFace 🤗</i>
    <br>
</div>

<br/>

To get started, simply install OpenLLM with pip:

```bash
pip install openllm
```

To start a LLM server, `openllm start` allows you to start any supported LLM
with a single command. For example, to start a `dolly-v2` server:

## 😌 tl;dr?

```bash
openllm start dolly-v2

# Starting LLM Server for 'dolly_v2'
#
# 2023-05-27T04:55:36-0700 [INFO] [cli] Environ for worker 0: set CPU thread coun t to 10
# 2023-05-27T04:55:36-0700 [INFO] [cli] Prometheus metrics for HTTP BentoServer f rom "_service.py:svc" can be accessed at http://localhost:3000/metrics.
# 2023-05-27T04:55:36-0700 [INFO] [cli] Starting production HTTP BentoServer from "_service.py:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
```

To see a list of supported LLMs, run `openllm start --help`.

On a different terminal window, open a IPython session and create a client to
start interacting with the model:

```python
>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')
```

To package the LLM into a Bento, simply use `openllm build`:

```bash
openllm build dolly-v2
```

> NOTE: To build OpenLLM from git source, pass in `OPENLLM_DEV_BUILD=True` to
> include the generated wheels into the bundle.

To fine-tune your own LLM, either use `LLM.tuning()`:

```python
>>> import openllm
>>> flan_t5 = openllm.LLM.from_pretrained("flan-t5")
>>> def fine_tuning():
...     fined_tune = flan_t5.tuning(method=openllm.tune.LORA | openllm.tune.P_TUNING, dataset='wikitext-2', ...)
...     fined_tune.save_pretrained('./fine-tuned-flan-t5', version='wikitext')
...     return fined_tune.path  # get the path of the pretrained
>>> finetune_path = fine_tuning()
>>> fined_tune_flan_t5 = openllm.LLM.from_pretrained('flan-t5', pretrained=finetune_path)
>>> fined_tune_flan_t5.generate('Explain to me the difference between "further" and "farther"')
```

## 📚 Features

🚂 **SOTA LLMs**: One-click stop-and-go supports for state-of-the-art LLMs,
including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.

📦 **Fine-tuning your own LLM**: Easily fine-tune any LLM with `LLM.tuning()`.

🔥 **BentoML 🤝 HuggingFace**: Built on top of BentoML and HuggingFace's
ecosystem (transformers, optimum, peft, accelerate, datasets), provides similar
APIs for ease-of-use.

⛓️ **Interoperability**: First class support for LangChain and
[🤗 Hub](https://huggingface.co/) allows you to easily chain LLMs together.

🎯 **Streamline production deployment**: Easily deploy any LLM via
`openllm bundle` with the following:

- [☁️ BentoML Cloud](https://l.bentoml.com/bento-cloud): the fastest way to
  deploy your bento, simple and at scale
- [🦄️ Yatai](https://github.com/bentoml/yatai): Model Deployment at scale on
  Kubernetes
- [🚀 bentoctl](https://github.com/bentoml/bentoctl): Fast model deployment on
  AWS SageMaker, Lambda, ECE, GCP, Azure, Heroku, and more!

## 🍇 Telemetry

OpenLLM collects usage data that helps the team to improve the product. Only
OpenLLM's internal API calls are being reported. We strip out as much
potentially sensitive information as possible, and we will never collect user
code, model data, or stack traces. Here's the
[code](./src/openllm/utils/analytics.py) for usage tracking. You can opt-out of
usage tracking by the `--do-not-track` CLI option:

```bash
openllm [command] --do-not-track
```

Or by setting environment variable `OPENLLM_DO_NOT_TRACK=True`:

```bash
export OPENLLM_DO_NOT_TRACK=True
```
