Metadata-Version: 1.2
Name: datacatalog-util
Version: 0.1.0
Summary: A package to manage Google Cloud Data Catalog helper commands and scripts
Home-page: https://github.com/mesmacosta/datacatalog-util
Author: Marcelo Miranda
Author-email: mesmacosta@gmail.com
License: MIT license
Description: datacatalog-util
        ================
        
        A Python package to manage Google Cloud Data Catalog helper commands and
        scripts.
        
        **Disclaimer: This is not an officially supported Google product.**
        
        1. Environment setup
        --------------------
        
        1.1. Python + virtualenv
        ~~~~~~~~~~~~~~~~~~~~~~~~
        
        Using `virtualenv <https://virtualenv.pypa.io/en/latest/>`__ is
        optional, but strongly recommended unless you use
        `Docker <#12-docker>`__.
        
        1.1.1. Install Python 3.6+
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        1.1.2. Create a folder
        ^^^^^^^^^^^^^^^^^^^^^^
        
        This is recommended so all related stuff will reside at same place,
        making it easier to follow below instructions.
        
        .. code:: bash
        
           mkdir ./datacatalog-util
           cd ./datacatalog-util
        
        *All paths starting with ``./`` in the next steps are relative to the
        ``utilsr`` folder.*
        
        1.1.3. Create and activate an isolated Python environment
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        .. code:: bash
        
           pip install --upgrade virtualenv
           python3 -m virtualenv --python python3 env
           source ./env/bin/activate
        
        1.1.4. Install the package
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        .. code:: bash
        
           pip install --upgrade .
        
        1.2. Docker
        ~~~~~~~~~~~
        
        Docker may be used as an alternative to run the script. In this case,
        please disregard the `Virtualenv <#11-python--virtualenv>`__ setup
        instructions.
        
        1.2.1. Get the source code
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        .. code:: bash
        
           git clone https://github.com/mesmacosta/datacatalog-util
           cd ./datacatalog-util
        
        1.3. Auth credentials
        ~~~~~~~~~~~~~~~~~~~~~
        
        1.3.1. Create a service account and grant it below roles
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        -  BigQuery Metadata Viewer
        -  Data Catalog Admin
        -  A custom role with ``bigquery.datasets.updateTag`` and
           ``bigquery.tables.updateTag`` permissions
        
        1.3.2. Download a JSON key and save it as
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        -  ``./credentials/datacatalog-util.json``
        
        1.3.3. Set the environment variables
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        *This step may be skipped if you’re using*\ `Docker <#12-docker>`__\ *.*
        
        .. code:: bash
        
           export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-util.json
        
        2. Load Tags from CSV file
        --------------------------
        
        2.1. Create a CSV file representing the Tags to be created
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        Tags are composed of as many lines as required to represent all of their
        fields. The columns are described as follows:
        
        +---------------------+----------------------------------+-----------+
        | Column              | Description                      | Mandatory |
        +=====================+==================================+===========+
        | **linked_resource** | Full name of the asset the Entry | Y         |
        |                     | refers to.                       |           |
        +---------------------+----------------------------------+-----------+
        | **template_name**   | Resource name of the Tag         | Y         |
        |                     | Template for the Tag.            |           |
        +---------------------+----------------------------------+-----------+
        | **column**          | Attach Tags to a column          | N         |
        |                     | belonging to the Entry schema.   |           |
        +---------------------+----------------------------------+-----------+
        | **field_id**        | Id of the Tag field.             | Y         |
        +---------------------+----------------------------------+-----------+
        | **field_value**     | Value of the Tag field.          | Y         |
        +---------------------+----------------------------------+-----------+
        
        *TIPS* -
        `sample-input/create-tags <https://github.com/mesmacosta/datacatalog-util/tree/master/sample-input/create-tags>`__
        for reference; - `Data Catalog Sample
        Tags <https://docs.google.com/spreadsheets/d/1bqeAXjLHUq0bydRZj9YBhdlDtuu863nwirx8t4EP_CQ>`__
        (Google Sheets) may help to create/export the CSV.
        
        2.2. Run the datacatalog-util script
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        -  Python + virtualenv
        
        .. code:: bash
        
           datacatalog-util create-tags --csv-file CSV_FILE_PATH
        
        -  Docker
        
        .. code:: bash
        
           docker build --rm --tag datacatalog-util .
           docker run --rm --tty \
             --volume CREDENTIALS_FILE_FOLDER:/credentials --volume CSV_FILE_FOLDER:/data \
             datacatalog-util create-tags --csv-file /data/CSV_FILE_NAME
        
        3. Export Tags to CSV file
        --------------------------
        
        3.1. A list of CSV files, each representing one Template will be created.
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        One file with summary with stats about each template, will also be
        created on the same directory.
        
        The columns for the summary file are described as follows:
        
        +-----------------------------------+-----------------------------------+
        | Column                            | Description                       |
        +===================================+===================================+
        | **template_name**                 | Resource name of the Tag Template |
        |                                   | for the Tag.                      |
        +-----------------------------------+-----------------------------------+
        | **tags_count**                    | Number of tags found from the     |
        |                                   | template.                         |
        +-----------------------------------+-----------------------------------+
        | **tagged_entries_count**          | Number of tagged entries with the |
        |                                   | template.                         |
        +-----------------------------------+-----------------------------------+
        | **tagged_columns_count**          | Number of tagged columns with the |
        |                                   | template.                         |
        +-----------------------------------+-----------------------------------+
        | **tag_string_fields_count**       | Number of used String fields on   |
        |                                   | tags of the template.             |
        +-----------------------------------+-----------------------------------+
        | **tag_bool_fields_count**         | Number of used Bool fields on     |
        |                                   | tags of the template.             |
        +-----------------------------------+-----------------------------------+
        | **tag_double_fields_count**       | Number of used Double fields on   |
        |                                   | tags of the template.             |
        +-----------------------------------+-----------------------------------+
        | **tag_timestamp_fields_count**    | Number of used Timestamp fields   |
        |                                   | on tags of the template.          |
        +-----------------------------------+-----------------------------------+
        | **tag_enum_fields_count**         | Number of used Enum fields on     |
        |                                   | tags of the template.             |
        +-----------------------------------+-----------------------------------+
        
        The columns for each template file are described as follows:
        
        +-----------------------------------+-----------------------------------+
        | Column                            | Description                       |
        +===================================+===================================+
        | **relative_resource_name**        | Full resource name of the asset   |
        |                                   | the Entry refers to.              |
        +-----------------------------------+-----------------------------------+
        | **linked_resource**               | Full name of the asset the Entry  |
        |                                   | refers to.                        |
        +-----------------------------------+-----------------------------------+
        | **template_name**                 | Resource name of the Tag Template |
        |                                   | for the Tag.                      |
        +-----------------------------------+-----------------------------------+
        | **tag_name**                      | Resource name of the Tag.         |
        +-----------------------------------+-----------------------------------+
        | **column**                        | Attach Tags to a column belonging |
        |                                   | to the Entry schema.              |
        +-----------------------------------+-----------------------------------+
        | **field_id**                      | Id of the Tag field.              |
        +-----------------------------------+-----------------------------------+
        | **field_type**                    | Type of the Tag field.            |
        +-----------------------------------+-----------------------------------+
        | **field_value**                   | Value of the Tag field.           |
        +-----------------------------------+-----------------------------------+
        
        .. _run-the-datacatalog-util-script-1:
        
        3.2. Run the datacatalog-util script
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        -  Python + virtualenv
        
        .. code:: bash
        
           datacatalog-util export-tags --project-ids my-project --dir-path DIR_PATH
        
        4. Load Templates from CSV file
        -------------------------------
        
        4.1. Create a CSV file representing the Templates to be created
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        Templates are composed of as many lines as required to represent all of
        their fields. The columns are described as follows:
        
        +------------------------+---------------------------+-----------+
        | Column                 | Description               | Mandatory |
        +========================+===========================+===========+
        | **template_name**      | Resource name of the Tag  | Y         |
        |                        | Template for the Tag.     |           |
        +------------------------+---------------------------+-----------+
        | **display_name**       | Resource name of the Tag  | Y         |
        |                        | Template for the Tag.     |           |
        +------------------------+---------------------------+-----------+
        | **field_id**           | Id of the Tag Template    | Y         |
        |                        | field.                    |           |
        +------------------------+---------------------------+-----------+
        | **field_display_name** | Display name of the Tag   | Y         |
        |                        | Template field.           |           |
        +------------------------+---------------------------+-----------+
        | **field_type**         | Type of the Tag Template  | Y         |
        |                        | field.                    |           |
        +------------------------+---------------------------+-----------+
        | **enum_values**        | Values for the Enum       | N         |
        |                        | field.                    |           |
        +------------------------+---------------------------+-----------+
        
        4.2. Run the datacatalog-util script - Create the Tag Templates
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        -  Python + virtualenv
        
        .. code:: bash
        
           datacatalog-util create-tag-templates --csv-file CSV_FILE_PATH
        
        4.3. Run the datacatalog-util script - Delete the Tag Templates
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        -  Python + virtualenv
        
        .. code:: bash
        
           datacatalog-util delete-tag-templates --csv-file CSV_FILE_PATH
        
        *TIPS* -
        `sample-input/create-tag-templates <https://github.com/mesmacosta/datacatalog-util/tree/master/sample-input/create-tag-templates>`__
        for reference;
        
        5. Export Templates to CSV file
        -------------------------------
        
        5.1. A CSV file representing the Templates will be created
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        Templates are composed of as many lines as required to represent all of
        their fields. The columns are described as follows:
        
        ====================== ==============================================
        Column                 Description
        ====================== ==============================================
        **template_name**      Resource name of the Tag Template for the Tag.
        **display_name**       Resource name of the Tag Template for the Tag.
        **field_id**           Id of the Tag Template field.
        **field_display_name** Display name of the Tag Template field.
        **field_type**         Type of the Tag Template field.
        **enum_values**        Values for the Enum field.
        ====================== ==============================================
        
        .. _run-the-datacatalog-util-script-2:
        
        5.2. Run the datacatalog-util script
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        -  Python + virtualenv
        
        .. code:: bash
        
           datacatalog-util export-tag-templates --project-ids my-project --file-path CSV_FILE_PATH
        
Platform: Posix; MacOS X; Windows
Classifier: Development Status :: 3 - Alpha
Classifier: Natural Language :: English
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.6
