Metadata-Version: 2.1
Name: pyraisdk
Version: 0.3.0
Summary: 
Author: Xiaodong Yang
Author-email: xiaoyan@microsoft.com
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: azure-eventhub (>=5.10.1,<6.0.0)
Requires-Dist: azure-identity (>=1.7.0,<2.0.0)
Requires-Dist: dataclasses-json (>=0.5.7,<0.6.0)
Requires-Dist: flask
Requires-Dist: psutil (>=5.6.3,<6.0.0)
Description-Content-Type: text/markdown

# pyraisdk

AML models are meant to be deployed to GPU instances to provide inference service. If the code that operates the model uses the GPU for inferencing in each request separately,
the overall performance of the model will be quite inefficent. This SDK has APIs that can allocate batches of inference requests to run on the GPU in a separate thread, thereby considerably
improving the usage efficiency of GPU and making the model more performant. 

The SDK also collects telemetry data for each so the performance of the model can be evaluated and tracked and provides logging primitives that can be used
to produce additional troubleshooting information.

## Deploy In and Out of RAI

Deploy in RAI, CD pipeline will take care of all the environment variables. But it requires an additional `BatchingConfig` in `deployment-target-configs`:
```json
{
    "Name": "xxx",
    "Version": 1,
    "InstanceCount": 1,
    "InstanceType": "xxx",
    "BatchingConfig": {
        "MaxBatchSize": 12,
        "IdleBatchSize": 5,
        "MaxBatchInterval": 0.002
    }
}
```

Deploy out of RAI, it's required to set environment variables for `BatchingConfig` manually before object of `DynamicBatchModel` is created. Refer to [Batching Parameter](#batching-parameter-attention). 

To enable log publishing (to eventhub), there are another several environment variables need to be set, refer to following logging part. It's optional.


## Dynamic Batching Support

There are APIs you **must** implement in your model to support batching of inference requests for best model performance. Those APIs allow the SDK
to distribute load efficiently to the GPU instances. The APIs are:

* ```preprocess``` Modifies the input to the model, if necessary. For example, if your model needs the input in a special JSON format instead of as a list
of strings, you can do that modification in the *preprocess* method.
* ```predict``` Executes the model inference for a list of input strings

### Batching Parameter (**Attention**)

Batching parameters are mandatory and should be ready before DynamicBatchModel is created. They are set through environment variables:
- **PYRAISDK_MAX_BATCH_SIZE** (int): Max size of each processing batch.
- **PYRAISDK_IDLE_BATCH_SIZE** (int): If there's no more data in queue, a new batch will be launched when size reaches this value.
- **PYRAISDK_MAX_BATCH_INTERVAL** (float): Max interval in seconds to wait for items. When waiting time exceeds, will launch a batch immediately.


### Usage Examples

Build `YourModel` class inherited from `pyraisdk.dynbatch.BaseModel`.

```python
from typing import List
from pyraisdk.dynbatch import BaseModel

class YourModel(BaseModel):
    def predict(self, items: List[str]) -> List[int]:
        rs = []
        for item in items:
            rs.append(len(item))
        return rs
            
    def preprocess(self, items: List[str]) -> List[str]:
        rs = []
        for item in items:
            rs.append(f'[{item}]')
        return rs
```

Initialize a `pyraisdk.dynbatch.DynamicBatchModel` with `YourModel` instance, and call `predict / predict_one` for inferencing.

```python
from pyraisdk.dynbatch import DynamicBatchModel

# prepare model
simple_model = YourModel()
batch_model = DynamicBatchModel(simple_model)

# predict
items = ['abc', '123456', 'xyzcccffaffaaa']
predictions = batch_model.predict(items)
assert predictions == [5, 8, 16]

# predict_one
item = 'abc'
prediction = batch_model.predict_one(item)
assert prediction == 5
```

Concurrent requests to `predict / predict_one`, in different threads.

```python
from threading import Thread
from pyraisdk.dynbatch import DynamicBatchModel

# prepare model
simple_model = YourModel()
batch_model = DynamicBatchModel(simple_model)

# thread run function
def run(name, num):
    for step in range(num):
        item = f'{name}-{step}'
        prediction = batch_model.predict_one(item)
        assert prediction == len(item) + 2

# start concurrent inference
threads = [Thread(target=run, args=(f'{tid}', 100)) for tid in range(20)]
for t in threads:
    t.start()
for t in threads:
    t.join()
```

## Loging & Events

### Description
This module is for logging and event tracing.

### interface

```python
def initialize(
    eh_hostname: Optional[str] = None,
    client_id: Optional[str] = None,
    eh_conn_str: Optional[str] = None,
    eh_structured: Optional[str] = None,
    eh_unstructured: Optional[str] = None,
    role: Optional[str] = None,
    instance: Optional[str] = None,
    sys_metrics_enable: bool = True,
)
```

Parameter description for `initialize`:
- **eh_hostname**: Fully Qualified Namespace aka EH Endpoint URL (*.servicebus.windows.net). Default, read `${EVENTHUB_NAMESPACE}.servicebus.windows.net`
- **client_id**: client_id of service principal. Default, read $UAI_CLIENT_ID
- **eh_conn_str**: connection string of eventhub namespace. Default, read $EVENTHUB_CONN_STRING
- **eh_structured**: structured eventhub name. Default, read $EVENTHUB_AUX_STRUCTURED
- **eh_unstructured**: unstructured eventhub name. Default, read $EVENTHUB_AUX_UNSTRUCTURED
- **role**: role, Default: `RemoteModel`
- **instance**: instance, Default: `${MODEL_NAME}|${ENDPOINT_VERSION}|{hostname}` or `${MODEL_NAME}|${ENDPOINT_VERSION}|{_probably_unique_id()}`
- **sys_metrics_enable**: Whether to enable auto metrics reporting periodically for system info like gpu, memory and gpu. Default: True

```python
def event(self, key: str, code: str, numeric: float, detail: str='', corr_id: str='', elem: int=-1)
def infof(self, format: str, *args: Any)
def infocf(self, corr_id: str, elem: int, format: str, *args: Any)
def warnf(self, format: str, *args: Any)
def warncf(self, corr_id: str, elem: int, format: str, *args: Any)
def errorf(self, format: str, *args: Any)
def errorcf(self, corr_id: str, elem: int, ex: Optional[Exception], format: str, *args: Any)
def fatalf(self, format: str, *args: Any)
def fatalcf(self, corr_id: str, elem: int, ex: Optional[Exception], format: str, *args: Any)
```

### examples

```python
# export EVENTHUB_AUX_UNSTRUCTURED='ehunstruct'
# export EVENTHUB_AUX_STRUCTURED='ehstruct'
# export UAI_CLIENT_ID='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
# export EVENTHUB_NAMESPACE='raieusdev-eh-namespace'

from pyraisdk import rlog
rlog.initialize()

rlog.infof('this is a info message %s', 123)
rlog.event('LifetimeEvent', 'STOP_GRACEFUL_SIGNAL', 0, 'detail info')
```

```python
# export EVENTHUB_AUX_UNSTRUCTURED='ehunstruct'
# export EVENTHUB_AUX_STRUCTURED='ehstruct'
# export EVENTHUB_CONN_STRING='<connection string>'

from pyraisdk import rlog
rlog.initialize()

rlog.infocf('corrid', -1, 'this is a info message: %s', 123)
rlog.event('RequestDuration', '200', 0.01, 'this is duration in seconds')
```

```python
from pyraisdk import rlog
rlog.initialize(eh_structured='ehstruct', eh_unstructured='ehunstruct', eh_conn_str='<eventhub-conn-str>')

rlog.errorcf('corrid', -1, Exception('error msg'), 'error message: %s %s', 1,2)
rlog.event('CpuUsage', '', 0.314, detail='cpu usage', corr_id='corrid', elem=-1)
```
