Metadata-Version: 2.1
Name: dkist-processing-core
Version: 0.2.7
Summary: Abstraction layer that is used by the DKIST Science Data Processing pipelines to process DKIST data using Apache Airflow.
Home-page: https://bitbucket.org/dkistdc/dkist-processing-core/src/main/
Author: NSO / AURA
Author-email: "fwatson@nso.edu"
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.8
Provides-Extra: test

dkist-processing-core
=====================

Overview
--------
The dkist-processing-core package provides an abstraction layer between the dkist data processing code, the workflow
engine that supports it (airflow), and the logging infrastructure. By providing the abstraction layer to airflow
specifically a versioning system is implemented.

.. image:: docs/auto_proc_brick.png
  :width: 600
  :alt: Core, Common, and Instrument Brick Diagram

There are 3 main entities which implement the abstraction:

*Task* : The Task provides a definition of the dkist data processing task interface.
It additionally implements some methods that should be global for all dkist processing tasks.  It also provides an api
to the application performance monitoring infrastructure.

*Workflow* : The Workflow defines an api independent of the workflow engine it abstracts.  It also implements the translation to
engine specific workflow definitions.  In the case of airflow this is a DAG.

*Node* : The Node is used by the Workflow to translate a Task to the engine specific implementation of the Task which runs inside of a python virtual environment.
The virtual environment enables the loading of only that tasks dependencies.

Additional support functions are provided in build_utils.

Usage
-----
The Workflow and Task are the primary objects used by client libraries.
The Task is used as a base class and the subclass must at a minimum implement run.
A Workflow is used to give the tasks an order of execution and a name for the flow.

.. code-block:: python

    from dkist_processing_core import TaskBase
    from dkist_processing_core import Workflow

    # Task definitions
    class MyTask1(TaskBase):
        def run(self):
            print("Running MyTask1")


    class MyTask2(TaskBase):
        def run(self):
            print("Running MyTask2")

    # Workflow definition
    # MyTask1 -> MyTask2
    w = Workflow(process_category="My", process_name="Workflow", workflow_package=__package__, workflow_version="dev")
    w.add_node(MyTask1, upstreams=None)
    w.add_node(MyTask2, upstreams=MyTask1)


Using the dkist-processing-core for data processing library/repo with airflow involves a project structure and
build process that results in code artifacts deployed to [PyPI](https://pypi.org/project/dkist/) and a
zip of workflow artifacts deployed to artifactory.

.. image:: docs/auto-proc-concept-model.png
  :width: 600
  :alt: Build Artifacts Diagram

The client dkist data processing libraries should implement a structure and build pipeline using [dkist-processing-test](https://bitbucket.org/dkistdc/dkist-processing-test/src/main/)
as an example.  The build pipelines for a client repo can leverage the [build_utils](dkist_processing_core/build_utils.py) for test and export.

Specifically for airflow, the resulting deployment has the versioned workflow artifacts all available to the scheduler
and the versioned code artifacts available to workers for task execution

.. image:: docs/automated-processing-deployed.png
  :width: 800
  :alt: Airflow Deployment Diagram

Build
-----
dkist-processing-core is built using [bitbucket-pipelines](bitbucket-pipelines.yml)

Deployment
----------
dkist-processing-core is deployed to [PyPI](https://pypi.org/project/dkist-processing-core/)

Environment Variables
---------------------

+---------------+----------------------------------------------------------------------------------------------------------------------------+------+---------+
| *VARIABLE*    | *Description*                                                                                                              |*Type*|*default*|
+===============+============================================================================================================================+======+=========+
| BUILD_VERSION | Build/Export pipelines only.  This is the value that will be appended to all artifacts and represents their unique version | STR  | dev     |
| MESH_CONFIG   | Provides the dkistdc cloud mesh configuration.  Specifically the location of the message broker                            | JSON |         |
| ISB_USERNAME  | Message broker user name                                                                                                   | STR  |         |
| ISB_PASSWORD  | Message broker password                                                                                                    | STR  |         |
+---------------+----------------------------------------------------------------------------------------------------------------------------+------+---------+


Development
-----------
.. code-block:: bash

    git clone git@bitbucket.org:dkistdc/dkist-processing-core.git
    cd dkist-processing-core
    pre-commit install
    pip install -e .[test]
    pytest -v --cov dkist_processing_core


