Anaml Python SDK
================

This repository contains the Anaml Python SDK and some examples that use the
SDK.

Usage
-----

The Anaml Python SDK provides several sets of features:

1. Methods and data types to interact with the Anaml server RESTful API.

2. Methods to graph Anaml features in an interactive notebook.

3. Methods to load Anaml feature data in Spark and/or Pandas.

If you plan to use (2) or (3) you will need to install the optional dependencies
used to implement the additional functionality. The available "extras" are:

- `plotting` includes graphing libraries to support the `preview_feature()`
  method.

- `pandas` includes libraries to support loading feature data with Pandas.

- `spark` includes libraries to support loading feature data with Spark.

- `aws` includes additional libraries to support loading data from AWS data
  storage platforms like S3.

- `google` includes additional libraries to support loading data from Google
  Cloud data storage platforms like BigQuery and Google Cloud Storage.

You can install these extra dependencies when you install the Python SDK with
PIP. Just include one or more of the extras described above when you run
`pip install`:

```shell
$ pip install "anaml-python-sdk[data,google]"
```

Do note, however, that you should almost install a full Spark distribution with
the additional libraries and configuration required in your environment. In that
case you should not use the `[spark]` extra.

Developing
----------

If you are working on recent versions of macOS, you will need to install Python
3.7 using Homebrew or some other tool.

Make sure you upgrade `pip` when warned. Newer versions of `pip` know about
binary compatibility between macOS versions. This allows it to download binary
wheel packages for large libraries (like scipy, numpy, and pandas) that would
otherwise require you to install FORTRAN and C++ compilers and libraries.

Docker Containers
-----------------

The Dockerfile allows you to build two Docker images:

1. An image containing the Anaml Python SDK; and
2. An image containing the Anaml Webhook Server.

```bash
$ docker build --target sdk --tag anaml-sdk .
$ docker build --tag anaml-webhook-server .
```

### Python SDK Image

The Anaml Python SDK image can be used as base image or as a way to access a
Python interpreter with the SDK and all the libraries pre-installed.

```bash
$ docker run --rm -ti anaml-sdk
Python 3.9.6 (default, Jul 22 2021, 15:24:21)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import anaml_client
>>> client = anaml_client.Anaml(url="...", apikey="...", secret="...")
>>>
```

### Webhook Server Image

To deploy the webhook server image you will need to ensure that:

1. The `ANAML_URL`, `ANAML_APIKEY`, and `ANAML_SECRET` environment variables are
   set.

2. Appropriate Google Cloud Platform credentials are exposed in the container in
   such a way that the Data Catalog client library can find them.

Assuming you have the appropriate authentication and environment variables
already set up in your shell, a command similar to this should work for you:

```bash
$ docker run --rm -ti -p 8090:8090 \
  -e ANAML_URL -e ANAML_SECRET -e ANAML_APIKEY \
  -v ~/.config/gcloud:/root/.config/gcloud \
  anaml-webhook
```

See `examples/webhook-server/README.md` for more details.
