Metadata-Version: 2.1
Name: afar
Version: 0.6.0
Summary: Run code on a Dask cluster via a context manager or IPython magic
Home-page: https://github.com/eriknw/afar
Author: Erik Welch
Author-email: erik.n.welch@gmail.com
License: BSD
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# **Afar**
[![Python Version](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue)](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue)
[![Version](https://img.shields.io/pypi/v/afar.svg)](https://pypi.org/project/afar/)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/eriknw/afar/blob/main/LICENSE)
[![Build Status](https://github.com/eriknw/afar/workflows/Test/badge.svg)](https://github.com/eriknw/afar/actions)
[![Coverage Status](https://coveralls.io/repos/eriknw/afar/badge.svg?branch=main)](https://coveralls.io/r/eriknw/afar)
[![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

> **_One man's magic is another man's engineering_**<br>
> _Robert A. Heinlein_
<hr>

## Installation
`afar` may be installed with pip:
```bash
pip install afar
```
or with [conda](https://docs.conda.io/en/latest/):
```bash
conda install -c conda-forge afar
```

## What is it?
`afar` allows you to run code on a remote [Dask](https://dask.org/) [cluster](https://distributed.dask.org/en/latest/) using context managers and [IPython magics](#Magic).  For example:
```python
import afar
from dask.distributed import Client
client = Client()

with afar.run, remotely:
    import dask_cudf
    df = dask_cudf.read_parquet("s3://...")
    result = df.sum().compute()
```
Outside the context, `result` is a [Dask Future](https://docs.dask.org/en/latest/futures.html) whose data resides on a worker.  `result.result()` is necessary to copy the data locally.

By default, only the last assignment is saved.  One can specify which variables to save:
```python
with afar.run("one", "two"), remotely:
    one = 1
    two = one + 1
```
`one` and `two` are now both Futures.  They can be used directly in other `afar.run` contexts:
```python
with afar.run as data, remotely:
    three = one + two

assert three.result() == 3
assert data["three"].result() == 3
```
`data` above is a dictionary of variable names to Futures.  It may be necessary at times to get the data from here.  Alternatively, you may pass a mapping to `afar.run` to use as the data.
```python
run = afar.run(data={"four": 4})
with run, remotely:
    seven = three + four
assert run.data["seven"].result() == 7
```
If you want to automatically gather the data locally (to avoid calling `.result()`), use `afar.get` instead of `afar.run`:
```python
with afar.get, remotely:
    five = two + three
assert five == 5
```
## Interactivity in Jupyter
There are several enhancements when using `afar` in Jupyter Notebook or Qt console, JupyterLab, or any IPython-based frontend that supports rich display.

The rich repr of the final expression will be displayed if it's not an assignment:
```python
with afar.run, remotely:
    three + seven
# displays 10!
```

Printing is captured and displayed locally:
```python
with afar.run, remotely:
    print(three)
    print(seven, file=sys.stderr)
# 3
# 7
```
These are done asynchronously using `ipywidgets`.

### Magic!
First load `afar` magic extension:
```python
%load_ext afar
```
Now you can use `afar` as line or cell magic.  `%%afar` is like `with afar.run, remotely:`.  It can optionally accept a list of variable names to save:
```python
%%afar x, y
x = 1
y = x + 1
```
and
```python
z = %afar x + y
```
## Is this a good idea?

I don't know, but it sure is a joy to use 😃 !

For motivation, see https://github.com/dask/distributed/issues/4003

It's natural to be skeptical of unconventional syntax.  And magic.  `afar` is both unconventional and magical, yet it also works well and is surprisingly *fun*!  Why not give it a try to see what you think?

We're still exploring the usability of `afar` [and want to hear what you think](https://github.com/eriknw/afar/discussions).  As you're learning `afar`, please ask yourself questions such as:
- can we spell anything better?
- does this offer opportunities?
- what is surprising?
- what is lacking?

Here's an example of an opportunity:
```python
on_gpus = afar.remotely(resources={"GPU": 1})

with afar.run, on_gpus:
    ...
```
This now works!  Keyword arguments to `remotely` will be passed to [`client.submit`](https://distributed.dask.org/en/latest/api.html#distributed.Client.submit).

I don't know about you, but I think this is starting to look and feel kinda nice, and it could probably be even better :)

## Caveats and Gotchas

### Repeatedly copying data

`afar` automatically gets the data it needs--and only the data it needs--from the outer scope
and sends it to the Dask cluster to compute on.  Since we don't know whether local data has been modified
between calls to `afar`, we serialize and send local variables every time we use `run` or `get`.
This is generally fine: it works, it's safe, and is usually fast enough.  However, if you do this
frequently with large-ish data, the performance could suffer, and you may be using
more memory on your local machine than necessary.

With Dask, a common pattern is to send data to the cluster with `scatter` and get a `Future` back.  This works:
```python
A = np.arange(10**7)
A = client.scatter(A)
with afar.run, remotely:
    B = A + 1
# A and B are now both Futures; their data is on the cluster
```

Another option is to pass `data` to `run`:
```python
run = afar.run(data={"A": np.arange(10**7)})
with afar.run, remotely:
    B = A + 1
# run.data["A"] and B are now both Futures; their data is on the cluster
```
Here's a nifty trick to use if you're in an IPython notebook: use `data=globals()`!
```python
run = afar.run(data=globals())
A = np.arange(10**7)
with run, remotely:
    B = A + 1
# A and B are now both Futures; their data is on the cluster
```
### Mutating remote data
As with any Dask workload, one should be careful to not modify remote data that may be reused.

### Mutating local data
Similarly, code run remotely isn't able to mutate local variables.  For example:
```python
d = {}
with afar.run, remotely:
    d['key'] = 'value'
# d == {}
```
## *✨ This code is highly experimental and magical! ✨*



