Metadata-Version: 2.1
Name: persist_to_disk
Version: 0.0.1
Summary: Persist expensive operations on disk.
Home-page: https://github.com/zlin7/python-persist_to_disk
Author: Zhen Lin
Author-email: zhenlin4@illinois.edu
License: MIT
Keywords: Cache,Persist
Description-Content-Type: text/markdown
License-File: LICENSE


# Installation

`pip install .`

**By default, a folder called `.persist_to_disk` is created under your home directory, and will be used to store cache files.**
If you want to change it, see "Global Settings" below.

## Global Settings

To set global settings (for example, where the cache should go by default), please do the following:

```
import persist_to_disk as ptd
ptd.config.generate_config()
```
Then, you could change the setting `persist_path` parameter in there.


# Example

Using `persist_to_disk` is very easy.
```
@ptd.persistf()
def train_a_model(dataset, model_cls, lr, epochs):
    ...
    return trained_model_or_key
```

Note that `ptd.persistf` can be used with multiprocessing directly.
If target function (e.g. `train_a_model`) is not gonna be pickled by such pipelines, you could use `persist`:
```
@ptd.persist()
def _train_a_model(dataset, model_cls, lr, epochs):
    ...
    return trained_model_or_key

def train_a_model(*args, **kwargs):
    trained_model_or_key = _train_a_model(*args, **kwargs)
    ... # Do more stuff
    return trained_model_or_key
```
`persist` and `persistf` take the same arguments.
For example, if you want to group the cache folder by dataset (so you can manage them easier manually), and your function takes some dictionary as input (which is not hashable), you could do:
```
@ptd.persistf(groupby=['dataset'], expand_dict_kwargs=['model_kwargs'])
def train_a_model(dataset, model_cls, model_kwargs, lr, epochs):
    ...
```



