Metadata-Version: 2.1
Name: instructure-dap-client
Version: 0.2.3
Summary: Data Access Platform client library
Author: Levente Hunyadi
Author-email: levente.hunyadi@instructure.com
Maintainer: Edina Tipter
Maintainer-email: edina.tipter@instructure.com
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Description-Content-Type: text/markdown
License-File: LICENSE

# Data Access Platform Client Library

Data Access Platform (DAP) acts as a single source of data for analytics at Instructure. It provides efficient access to data collected across various educational products in bulk with high fidelity and low latency, adhering to a canonical data model.

The outgoing interface for DAP is the [Query API](https://data-access-platform-api.s3.amazonaws.com/index.html), which is an HTTP REST service. Users initiate asynchronous queries to retrieve data associated with their account. This client library is a Python wrapper around the DAP API.

Each DAP user acts as a data administrator for the organization they represent. They have full read access to the top-level account and all descendant sub-accounts. For example, in Canvas, the top of the organization hierarchy is uniquely identified by a root account ID, and each data record is associated with a root account ID. A DAP user with Canvas access can query data that are assigned the user's root account ID.

DAP API requires authentication. The client library takes care of authentication behind the scenes provided you have the appropriate API key, and passes the token to each API operation it invokes. Refer to the documentation of Instructure [API Gateway Service](https://api-gateway.instructure.com/doc/) to learn more about the authentication process.

Under the hood, API users must first acquire a [JSON Web Token](https://datatracker.ietf.org/doc/html/rfc7519) (JWT) obtained from the authentication endpoint of Instructure [API Gateway Service](https://api-gateway.instructure.com/doc/) in order to invoke DAP API endpoints, and pass the JWT to all subsequent calls to DAP API.

## Major features

* List the name of tables available for querying
* Download the JSON schema of a selected table
* Fetch a full table snapshot
* Fetch incremental updates since a specific point in time
* Save data in several output formats: CSV, TSV, JSON, Parquet
* Download output to a local directory

## Getting started

Accessing DAP API requires a URL to an endpoint, and an API key. Once obtained, they can be set as environment variables (recommended), or passed as command-line arguments:

### Use environment variables for authentication

First, configure the environment with what you have in your setup instructions:
```sh
export DAP_API_URL=https://api-gateway.instructure.com
export DAP_API_KEY=aCBd3V...U1aaaa
```

With environment variables set, you can issue `dap` commands directly:
```sh
dap incremental --namespace canvas --table accounts --since 2022-07-13T09:30:00+02:00
```

### Use command-line for authentication

Unless you set environment variables, you need to pass endpoint URL and API key to the `dap` command explicitly:
```sh
dap --base-url https://api-gateway.instructure.com --api-key aCBd3V...U1aaaa incremental --namespace canvas --table accounts --since 2022-07-13T09:30:00+02:00
```

## Command-line usage

Invoking the command-line utility with `--help` shows usage, required and optional arguments:
```sh
dap --help
dap incremental --help
dap snapshot --help
dap list --help
dap schema --help
```

## Common use cases

### Chain a snapshot query with an incremental query

When you start using DAP, you will definitely want to download a snapshot for the table(s) you need. In the snapshot query response body, you will find a field called `at`, which captures the data lake state at a point in time that the snapshot corresponds to. Copy the timestamp into the `since` field of an incremental query request. This will guarantee that you have chained the two queries and will not miss any data.

Note that if a table has not received updates for a while (e.g. user profiles have not changed over the weekend), the value of `at` might be well behind current time.

### Chain an incremental query with another

To fetch the most recent changes since a previous incremental query, chain the next request to the previous response using `since` and `until`. The `until` of a previous response becomes the `since` of the next request. The `until` of the next request should typically be omitted, it is automatically populated by DAP API. This allows you to fetch the most recent changes for a table. If a table has not received updates for a while, timestamps you see in the response may lag behind current time.

For example, suppose you submit an incremental query job `#82`, and receive a response whose `until` is `2021-07-28T19:00`. You can then pass `2021-07-28T19:00` as the value for `since` in your next incremental query job `#83`. Job `#83` would then return `2021-07-28T19:00` as the value of `since` (the exact value you submitted), and might return `2021-07-28T21:00` as `until` (the latest point in time for which data is available).

If you choose to fill in `until` in a request (which is not necessary in most cases), its value must be in the time range DAP has data for. Otherwise, your request is rejected.


### Get the list of tables available for querying
The `list` command will return all table names from a certain namespace.

### Download the latest schema for a table

The schema endpoint returns the latest schema of a table as a [JSON Schema](https://json-schema.org/) document. The `schema` command enables you to download the schema of a specified table as a JSON file.
