Metadata-Version: 2.1
Name: vdk-lineage
Version: 0.3.826474487
Summary: VDK Lineage plugin collects lineage (input -> job -> output) information and send it to a pre-configured destination.
Home-page: https://github.com/vmware/versatile-data-kit
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Description-Content-Type: text/markdown

# VDK Lineage

VDK Lineage plugin provides lineage data (input data -> job -> output data) information and send it to a pre-configured
destination. The lineage data is send using [OpenLineage standard](https://openlineage.io)

![](vdk-lineage.png)

At POC level currently.

Currently, lineage data is collected
 - For each data job run/execution both start and end events including the status of the job (failed/succeeded)
 - For each execute query we collect input and output tables.

TODOs:
 - Collect status of the SQL query (failed, succeeded)
 - Create parent /child relationship between sql event and job run event to track them better (single job can have multiple queries)
 - Non-SQL lineage (ingest, load data,etc)
 - Extend support for all queries
 - provide more information using facets – op id, job version,
 - figure out how to visualize parent/child relationships in Marquez
 - Explore openlineage.sqlparser instead of sqllineage library as alternative


## Usage

```
pip install vdk-lineage
```

And it will start collecting lineage from job and sql queries.

To send data using openlineage specify VDK_OPENLINEAGE_URL. For example:
```
export VDK_OPENLINEAGE_URL=http://localhost:5002
vdk marquez-server --start
vdk run some-job
# check UI for lineage
# stopping the server will delete any lineage data.
vdk marquez-server --stop
```

## Build and testing

In order to build and test a plugin go to the plugin directory and use `../build-plugin.sh` script to build it
