Metadata-Version: 2.1
Name: databricks-cicd
Version: 0.1.9
Summary: CICD tool for testing and deploying to Databricks
Home-page: https://github.com/man40/databricks-cicd
Author: Manol Manolov
Author-email: man40dev@gmail.com
License: Apache License 2.0
Project-URL: Source, https://github.com/man40/databricks-cicd
Description: # Databricks CI/CD
        [![PyPI Latest Release](https://img.shields.io/pypi/v/databricks-cicd.svg)](https://pypi.org/project/databricks-cicd/)
        
        This is a tool for building CI/CD pipelines for Databricks. It is a python package that
        works in conjunction with a custom GIT repository (or a simple file structure) to validate 
        and deploy content to databricks. Currently, it can handle the following content:
        * **Workspace** - a collection of notebooks written in Scala, Python, R or SQL
        * **Jobs** - list of Databricks jobs
        * **Clusters**
        * **Instance Pools**
        * **DBFS** - an arbitrary collection of files that may be deployed on a Databricks workspace
        
        # Installation
        `pip install databricks-cicd`
        
        # Requirements
        To use this tool, you need a source directory structure (preferably as a private GIT repository) 
        that has the following structure:
        ```
        any_local_folder_or_git_repo/
        ├── workspace/
        │   ├── some_notebooks_subdir
        │   │   └── Notebook 1.py
        │   ├── Notebook 2.sql
        │   ├── Notebook 3.r
        │   └── Notebook 4.scala
        ├── jobs/
        │   ├── My first job.json
        │   └── Side gig.json
        ├── clusters/
        │   ├── orion.json
        │   └── Another cluster.json
        ├── instance_pools/
        │   ├── Pool 1.json
        │   └── Pool 2.json
        └── dbfs/
            ├── strawbery_jam.jar
            ├── subdir
            │   └── some_other.jar
            ├── some_python.egg
            └── Ice cream.jpeg
        ```
        
        **_Note:_** All folder names represent the default and can be configured. This is just a sample.
        
        # Usage
        For the latest options and commands run:
        ```
        cicd -h
        ```
        A sample command could be:
        ```shell
        cicd deploy \
           -w sample_12432.7.azuredatabricks.net \
           -u john.smith@domain.com \
           -t dapi_sample_token_0d5-2 \
           -lp '~/git/my-private-repo' \
           -tp /blabla \
           -c DEV.ini \
           --verbose
        ```
        **_Note:_** Paths for windows need to be in double quotes
        
        The default configuration is defined in [default.ini](databricks_cicd/conf/default.ini) and can be overridden with a
        custom ini file using the -c option, usually one config file per target environment. ([sample](config_sample.ini))
        
        # Create content
        
        #### Notebooks:
        1. Add a notebook to source
           1. On the databricks UI go to your notebook. 
           1. Click on `File -> Export -> Source file`. 
           1. Add that file to the `workspace` folder of this repo **without changing the file name**.
        
        #### Jobs:
        1. Add a job to source
           1. Get the source of the job and write it to a file. You need to have the
              [Databricks CLI](https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html#install-the-cli) 
              and [JQ](https://stedolan.github.io/jq/download/) installed. 
              For Windows, it is easier to rename the `jq-win64.exe` to `jq.exe` and place it 
              in `c:\Windows\System32` folder. Then on Windows/Linux/MAC: 
              ```
              databricks jobs get --job-id 74 | jq .settings > Job_Name.json
              ```
              This downloads the source JSON of the job from the databricks server and pulls only the settings from it, 
              then writes it in to a file.
              
              **_Note:_** The file name should be the same as the job name within the json file. Please, avoid spaces 
              in names.
           1. Add that file to the `jobs` folder
           
        #### Clusters:
        1. Add a cluster to source
           1. Get the source of the cluster and write it to a file. 
              ```
              databricks clusters get --cluster-name orion > orion.json
              ```
              **_Note:_** The file name should be the same as the cluster name within the json file. Please, avoid spaces 
              in names.
           1. Add that file to the `clusters` folder
           
        #### Instance pools:
        1. Add an instance pool to source
           1. Similar to clusters, just use `instance-pools` instead of `clusters`
           
        #### DBFS:
        1. Add a file to dbfs
           1. Just add a file to the the `dbfs` folder.
           
        # TODO
        * Improve validation. It is still a baby.
        
Keywords: databricks cicd
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3.6
Description-Content-Type: text/markdown
