Metadata-Version: 2.1
Name: odd-dbt
Version: 0.1.9
Summary: OpenDataDiscovery Action for dbt
License: Apache-2.0
Keywords: Open Data Discovery,dbt,Metadata,Data Discovery,Data Observability
Author: Mateusz Kulas
Author-email: mkulas@provectus.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: dbt-core (>=1.4.5,<2.0.0)
Requires-Dist: dbt-postgres (>=1.4.5,<2.0.0)
Requires-Dist: dbt-redshift (>=1.4.0,<2.0.0)
Requires-Dist: dbt-snowflake (>=1.4.1,<2.0.0)
Requires-Dist: funcy (>=1.17,<2.0)
Requires-Dist: loguru (>=0.6.0,<0.7.0)
Requires-Dist: odd-models (>=2.0.23,<3.0.0)
Requires-Dist: oddrn-generator (>=0.1.70,<0.2.0)
Requires-Dist: psycopg2 (>=2.9.5,<3.0.0)
Requires-Dist: sqlalchemy (>=1.4.46,<2.0.0)
Requires-Dist: typer[all] (>=0.7.0,<0.8.0)
Description-Content-Type: text/markdown

# OpenDataDiscovery dbt tests metadata collecting
[![PyPI version](https://badge.fury.io/py/odd-dbt.svg)](https://badge.fury.io/py/odd-dbt)

Library used for running dbt tests and injecting them as entities to ODD platform. 

## Supported data sources
| Source    |
|-----------| 
| Snowflake | 
| Redshift  |
| Postgres  |

## Requirements
Library to inject Quality Tests entities requires presence of corresponding with them datasets entities in the platform.  
For example: if you want to inject data quality test of Snowflake table, you need to have entity of that table present in the platform.

## Supported tests
Library supports for basics tests provided by dbt. 
- `unique`: values in the column should be unique
- `not_null`: values in the column should not contain null values
- `accepted_values`: column should only contain values from list specified in the test config
- `relationships`: each value in the select column of the model exists as a specified field in the reference table (also known as referential integrity)

## ODDRN generation for datasets
host_settings of ODDRN generators required for source datasets are loaded from `.dbt/profiles.yml`.  
Profiles inside the file looks different for each type of data source.  
**Snowflake** host_settings value is created from field `account`. Field value should be `<account_identifier>`  
For example the URL for an account uses the following format: `<account_identifier>`.snowflakecomputing.com  
Example Snowflake account identifier `hj1234.eu-central-1`.  
**Redshift** and **Postgres** host_settings are loaded from field `host` field.  
Example Redshift host: `redshift-cluster-example.123456789.eu-central-1.redshift.amazonaws.com`  
