Metadata-Version: 1.2
Name: dingDONG
Version: 1.0.16
Summary: Data modeling managing and transforming data
Home-page: https://github.com/biskilled/dingDONG
Author: Tal@BiSkilled.com
Author-email: Tal@BiSkilled.com
License: GPLv3+
Description: |PyPI version| |Docs badge| |License|
        
        ******************************
        Ding Dong data preparation ETL
        ******************************
        
        dingDong is a metadata management infrastructure for data preparation, ETL, and machine learning models
        with a complete version control support for all data preparation, loading, and ML experiments.
        
        using dingDONG propagation mechanism help to design, develop, maintain and scale complex batch and real-time data projects in a fraction of time
        
        See `Ding-Dong documentation <https://dingdong.readthedocs.io/en/latest>`_ for install and developer documentation.
        More information can be found at http://www.biSkilled.com
        
        Why Ding Dong?
        ==============
        **Simple, because it is fast and easy to implement !**
        
        Steps for creating at data project:
        
        - Source to target modeling - mostly done by excel or external ERD mapping tools
        - Implementing - done mainly by external ETL tools
        - Improvements, extensions, maintenance and deploy between environments - usually an iterative process which consumes time and effort. meta-data is constantly changing. if business logic involves ML/DL models - versioning must be set for trucking overall results accuracy
        
        **dingDONG developed to maintain and speed the overall data project implementation**
        
        - Source to target modeling - dingDONG truck meta-data changes such as adding, removing, updating DB columns. converting data-type from diverse connections or adding key sequence is done automatically. Modeling can be managed in a simple excel file, dindDONG will truck excel updates and will update all object by using a propagation mechanism.
        - Implementing E(T)L (extract, transform, load) - dingDONG multi-threading and data fetching optimization give a powerful tool for massive bulk data loading
        - Transform - It is possible to add calculated columns or manipulate existing column by adding python functions. A function such as scaling or normalization can be added as easy as adding the current date or concatenate functions
        - Business logic - Maintaining business logic in dingDONG is used directly by using SQL files that can be executed directly at the development IDE. Using dingDONG **comments** syntax is used to extract and execute complex queries from SQL files which allows managing all business logic in separate files.
        - Version control - dingDONG version control and management keeps track of all metadata changes and stored it in a proprietary dingDONG DB (MongoDB). dingDONG supports **GIT** version-control mechanism for managing ML/DL projects, versioning and tracking experiments history results. There is full support in **GITHUB** for distributed versioning.
        
        dingDONG modules:
        
        - DING - create and manage overall meta-data structure for all listed object in the work-flow
            - creating new objects
            - modify an existing object by using a back-propagation mechanism
            - update data into new object
            - save old structures
            - CI source control
        
        - DONG - extract and load data from diverse connectors
            - extract data - support multi-threading for extracting massive data volume
            - transfer     - add a custom function on existing columns or new custom calculated fields
            - merge        - merging source with the target object (supported only for objects at the same connector)
            - exec         - execute PL/SQL/Stored procedure commands
        
        - VERSION CONTROL - track meta-data and code updates
            - Versions numbers are managed automatically (sample: <major-release>.<minor-release>.<fixId>)
            - Any change at object metadata is automatically stored at dingDONG repository
            - Support managed GIT versioning of ML / DL experiments.
               - Code update - a new version is created
               - Experiment results stored in dingDong repository for future analytics
               - Roll revisions history by version number
               - Define measures such as counting total input or total output rows
               - Store executions result for compare experiments performance
        
        .. image:: https://github.com/biskilled/dingDong/blob/master/docs/_static/dingDongOneSlide.jpg
           :alt: dingDONG architecture
        
        
        Installation
        ============
        
        `download from GitHub <https://github.com/biskilled/dingDong>`_ or install by using ``pip``::
        
            pip install dingDong
        
        Adding MongoDB for pip installation is now cooked. Installation instruction will come soon
        Docker support - as well, still at the oven .. will come soon
        
        Samples
        =======
        
        Samples can be found under `Ding-Dong sample documentation <https://dingdong.readthedocs.io/en/latest/rst/samples.html>`_
        
        The sample below loads 3 CSV files into SQLite. (we are using SQL to merge this CSV and extract data into a report table and CSV file).
        
        Download ZIP file with 3 samples CSV files DATAELEMENTDESCRIPTION.csv, DEMOGRAPHICS.csv, MEASURESOFBIRTHANDDEATH.csv
        located at `samples/sampleHealthCare/csvData.zip <https://github.com/biskilled/dingDong/raw/master/samples/sampleHealthCare/csvData.zip>`_ folder.
        In this sample, we use **C:\\dingDONG** as our main folder for all source CSV files and dingDong logs.
        
        Full code sample **extractCsvToSqlLite.py** located at `samples/sampleHealthCare/ <https://github.com/biskilled/dingDong/tree/master/samples/sampleHealthCare/extractCsvToSqlLite.py>`_ folder
        
        The sample demonstrates how to load 3 CSV files into SqlLite, create a simple query-based
        on those tables and send the result to a new CSV file.
        
        configuration properties can be found at `dingDONG documentation <https://dingdong.readthedocs.io/en/latest>`_
        
        ::
        
            import logging
            from dingDong import DingDong
            from dingDong import Config
        
            """ Main configuration """
        
            """ set log level: logging.INFO, logging.DEBUG, logging.ERROR """
            Config.LOGS_DEBUG = logging.DEBUG
        
            """
                set connection URL into all connectors
                key   -> general connection name or connection type (sql, oracle, file .. )
                value -> can be string or dictionary
                    String     --> Connection string URL (key defined connection type: sql, oracle, mySql....)
                    Dictionary --> must have 'conn' (connection type) and 'url' (connection string).
                available connection can be found at dingDong.misc.enumsJson.eConn
        
            """
            Config.CONNECTIONS = {
                'sampleSql': {'conn': 'sql',"url": "<Sql server connection string>;UID=USER;PWD=PWD;"},
                'file': "C:\\dingDong\\",
                'sqlite': "C:\\dingDong\\sqlLiteDB.db"}
        
        2. Creating workflow can be done as JSON format or python dictionaries
           For the sake of this example, we will use a python dictionary. The sample work-flow will contain:
        
        * mapping and loading CSV file named DATAELEMENTDESCRIPTION into SQLLite table named dateElements_Desc
        * mapping and loading CSV file named DEMOGRAPHICS into SqlLite table named demographics
        * mapping and loading CSV file named MEASURESOFBIRTHANDDEATH into SQLLite table named birthDate
        * create a new query based on demographics and birthDate  into new table named Final
        * Update sample fields at Final table by using direct PL/SQL query
        * Extract Final data into a CSV file.
          We use VARCHAR(200) as default CSV column data type. configuration can be found at DEFAULTS under dingDong.conn.baseBatch
        
        ::
        
            nodesToLoad = [
                {"source": ["file", "DATAELEMENTDESCRIPTION.csv"],
                 "target": ["sqlite", "dateElements_Desc"]},
        
                {"source": ["file", "DEMOGRAPHICS.csv"],
                 "target": ["sqlite", "demographics"]},
        
                {"source": ["file", "MEASURESOFBIRTHANDDEATH.csv"],
                 "target": ["sqlite", "birthDate"]},
        
                {"query": ["sqlite", """   Select d.[State_FIPS_Code] AS A, d.[County_FIPS_Code] AS B, d.[County_FIPS_Code] AS G,d.[County_FIPS_Code], d.[CHSI_County_Name], d.[CHSI_State_Name],[Population_Size],[Total_Births],[Total_Deaths]
                                                From demographics d INNER JOIN birthDate b ON d.[County_FIPS_Code] = b.[County_FIPS_Code] AND d.[State_FIPS_Code] = b.[State_FIPS_Code]"""],
                 "target": ["sqlite", "Finall", -1]},
        
                {"myexec": ["sqlite", "Update dateElements_Desc Set [Data_Type] = 'dingDong';"]},
        
                {"source": ["sqlite", "Finall"],
                 "target": ["file", "finall.csv"]}
            ]
        
        3. Init class dingDong
        
        :dicObj:        Loading dictionary as a work flow
        :dirData:       Loading JSON files in this folder
        :includeFiles:  Include files to load from directory folder (dirData)
        :notIncldeFiles: Exclude files to load from directory folder (dirData)
        :connDict:      Equal to Config.CONNECTIONS, set connection URLs string
        :processes:     Max number of parallel threading to load data (DONG module)
        
        ::
        
            m = DingDong(dicObj=nodesToLoad,
                         filePath=None,
                         dirData=None,
                         includeFiles=None,
                         notIncludeFiles=None,
                         connDict=None,
                         processes=1)
        
        4. DING
        
        * creating dateElements_Desc, demographics and birthDate tables based on CSV files
        * extracting query structure and creating Final table
        
         Flag -1 - default flag,  indicate that on changed structure- old structure is stored with all data. object is udated to new strucutre
        
        ::
        
            m.ding()
        
        5.  DONG - Extracting and loading data from CSV files into SQLite table, using default truncate-> insert method
            Extract data from a query into Final table
        
        * if object structure changed and mode 1 (default mode)
          * history table will is created
          * new object structure is created. new object is populated with data from history table (only identical column name)
        
        ::
        
                m.dong()
        
        Full sample code::
        
            import logging
            from dingDong import DingDong
            from dingDong import Config
        
            Config.CONNECTIONS = {
                'x1'    : {'conn':'sql',"url":"DRIVER={SQL Server};SERVER=CPX-VLQ5GA42TW2\SQLEXPRESS;DATABASE=ContosoRetailDW;UID=bpmk;PWD=bpmk;"},
                'x2'    : {'conn':'sql',"url":"DRIVER={SQL Server};SERVER=CPX-VLQ5GA42TW2\SQLEXPRESS;DATABASE=ContosoRetailDW;UID=bpmk;PWD=bpmk;"},
                'file'  : "C:\\dingDong\\",
                'sqlite': "C:\\dingDong\\sqlLiteDB.db"}
            Config.LOGS_DEBUG = logging.DEBUG
            Config.LOGS_DIR = "C:\\dingDong"
        
            nodesToLoad = [
                    {   "source":["file","DATAELEMENTDESCRIPTION.csv"],
                        "target":["sqlite","dateElements_Desc"]},
        
                    {   "source":["file","DEMOGRAPHICS.csv"],
                        "target":["sqlite","demographics"]},
        
                    {   "source":["file","MEASURESOFBIRTHANDDEATH.csv"],
                        "target":["sqlite","birthDate"]},
        
                    {   "query":["sqlite","""   Select d.[State_FIPS_Code] AS A, d.[County_FIPS_Code] AS B, d.[County_FIPS_Code] AS G,d.[County_FIPS_Code], d.[CHSI_County_Name], d.[CHSI_State_Name],[Population_Size],[Total_Births],[Total_Deaths]
                                                From demographics d INNER JOIN birthDate b ON d.[County_FIPS_Code] = b.[County_FIPS_Code] AND d.[State_FIPS_Code] = b.[State_FIPS_Code]"""],
                        "target":["sqlite","Final", 2]},
        
                    {   "myexec":["sqlite","Update dateElements_Desc Set [Data_Type] = 'dingDong';"]},
        
                    {   "source":["sqlite","Final"],
                        "target":["file","final.csv"]}
                  ]
        
            m = DingDong(dicObj=nodesToLoad,
                         filePath=None,
                         dirData=None,
                         includeFiles=None,
                         notIncludeFiles=None,
                         connDict=None,
                         processes=1)
            m.ding()
            m.dong()
        
        Road map
        ========
        
        We would like to create a platform that will enable to design, implement and maintain data integration project such as:
        
        *  Any REST API connectivity from any API to any API using simple JSON mapping
        *  Any Relational database connectivity using JSON mapping
        *  Any Non-relational storage
        *  Main platform for any middle-ware business logic - from sample if-than-else up to statistics algorithms using ML and DL algorithms
        *  Enable Real-time and scheduled integration
        *  Single point of truth - maintain all changes by using git source control and enable to compare version and rollback as needed
        
        We will extend our connectors and Meta-data manager accordingly.
        
        BATCH supported connectors
        ==========================
        
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | connectors Type   | python module    | checked version  | dev status  | notes                                    |
        +===================+==================+==================+=============+==========================================+
        | sql               |  pyOdbc          | 4.0.23           | tested, prod| slow to extract, massive data volume     |
        |                   |                  |                  |             | preferred using ceODBC                   |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | sql               | ceODBC           | 2.0.1            | tested, prod| sql server conn for massive data loading |
        |                   |                  |                  |             | installed manually from 3rdPart folder   |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | access            | pyOdbc           | 4.0.23           | tested, prod|                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | oracle            | cx-oracle        | 6.1              | tested, prod|                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | CSV / text files  | CSV / CSV23      | 0.1.5            | tested, prod|                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | mysql             | pyMySql          | 0.6.3rc1         | dev         |                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | vertica           | vertica-python   | 0.9.1            | dev         |                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | sqllite           | sqllite3         | 6.1              | tested, prod|                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | mongoDb           | pyMongo          | 3.7.2            | dev         |                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | salesforce        | simple_salesforce| 3.7.2            | dev         |                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        | haddop/Hive       | .                | .                | dev         |                                          |
        +-------------------+------------------+------------------+-------------+------------------------------------------+
        
        Authors
        =======
        
        dingDONG was created by `Tal Shany <http://www.biskilled.com>`_
        (tal@biSkilled.com)
        
        We are looking for contributions !!!
        
        License
        =======
        
        GNU General Public License v3.0
        
        See `COPYING <https://github.com/biskilled/dingDong/blob/master/COPYING>`_ to see the full text.
        
        .. |PyPI version| image:: https://img.shields.io/pypi/v/dingDong.svg
           :target: https://github.com/biskilled/dingDong
        .. |Docs badge| image:: https://img.shields.io/badge/docs-latest-brightgreen.svg
           :target: https://readthedocs.org/projects/dingDong/
        .. |License| image:: https://img.shields.io/badge/license-GPL%20v3.0-brightgreen.svg
           :target: COPYING
           :alt: Repository License
        
        
Keywords: ETL,Data preparation,Data modeling,Python,Integration,Mapping,ML devops,CI,CD
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Utilities
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*
