Metadata-Version: 2.1
Name: ploomber
Version: 0.5.1
Summary: Spend your time discovering insights from data, not writing plumbing code. Declare your pipeline in a short YAML file and Ploomber will take care of the rest.
Home-page: https://github.com/ploomber/ploomber
Author: 
Author-email: 
License: UNKNOWN
Description: Ploomber
        ========
        
        .. image:: https://travis-ci.org/ploomber/ploomber.svg?branch=master
            :target: https://travis-ci.org/ploomber/ploomber.svg?branch=master
        
        .. image:: https://readthedocs.org/projects/ploomber/badge/?version=latest
            :target: https://ploomber.readthedocs.io/en/latest/?badge=latest
            :alt: Documentation Status
        
        .. image:: https://mybinder.org/badge_logo.svg
         :target: https://mybinder.org/v2/gh/ploomber/projects/master
        
        
        
        Point Ploomber to your Python and SQL scripts in a ``pipeline.yaml`` file and it will figure out execution order by extracting dependencies from them.
        
        
        It also keeps track of source code changes to speed up builds by skipping up-to-date tasks. This is a great way to interactively develop your projects, sync work with your team and quickly recover from crashes (just fix the bug and build again).
        
        
        `Try out the live demo (no installation required) <https://mybinder.org/v2/gh/ploomber/projects/master?filepath=spec%2FREADME.md>`_.
        
        `Click here for documentation <https://ploomber.readthedocs.io/>`_.
        
        `Our blog <https://ploomber.io/>`_.
        
        
        Works with Python 3.5 and higher.
        
        
        ``pipeline.yaml`` example
        -------------------------
        
        .. code-block:: yaml
        
            # pipeline.yaml
        
            # clean data from the raw table
            - source: clean.sql
              product: clean_data
              # function that returns a db client
              client: db.get_client
        
            # aggregate clean data
            - source: aggregate.sql
              product: agg_data
              client: db.get_client
        
            # dump data to a csv file
            - class: SQLDump
              source: dump_agg_data.sql
              product: output/data.csv
              client: db.get_client
        
            # visualize data from csv file
            - source: plot.py
              product:
                # where to save the executed notebook
                nb: output/executed-notebook-plot.ipynb
                # tasks can generate other outputs
                data: output/some_data.csv
        
        
        
        Python script example
        ---------------------
        
        .. code-block:: python
        
            # annotated python file (it will be converted to a notebook during execution)
            import pandas as pd
        
            # + tags=["parameters"]
            # this script depends on the output generated by a task named "clean"
            upstream = {'clean': None}
            product = None
        
            # during execution, a new cell is added here
        
            # +
            df = pd.read_csv(upstream['some_task'])
            # do data processing...
            df.to_csv(product['data'])
        
        
        SQL script example
        ------------------
        
        .. code-block:: sql
        
            DROP TABLE IF EXISTS {{product}};
        
            CREATE TABLE {{product}} AS
            -- this task depends on the output generated by a task named "clean"
            SELECT * FROM {{upstream['clean']}}
            WHERE x > 10
        
        
        
        To run your pipeline:
        
        .. code-block:: bash
        
            ploomber entry pipeline.yaml
        
        
        If you build again, tasks whose source code is the same (and all
        upstream dependencies) are skipped.
        
        
        Start an interactive session (note the double dash):
        
        .. code-block:: bash
        
            ipython -i -m ploomber.entry pipeline.yaml -- --action status
        
        
        During an interactive session:
        
        
        .. code-block:: python
        
            # visualize dependencies
            dag.plot()
        
            # develop your Python script interactively
            dag['task'].develop()
        
            # line by line debugging
            dag['task'].debug()
        
        
        Install
        -------
        
        .. code-block:: shell
        
            pip install ploomber
        
        
        To install Ploomber along with all optional dependencies:
        
        .. code-block:: shell
        
            pip install "ploomber[all]"
        
        ``graphviz`` is required for plotting pipelines:
        
        .. code-block:: shell
        
            # if you use conda (recommended)
            conda install graphviz
            # if you use homebrew
            brew install graphviz
            # for more options, see: https://www.graphviz.org/download/
        
        
        Create a project with basic structure
        -------------------------------------
        
        .. code-block:: shell
        
            ploomber new
        
        
        Python API
        ----------
        
        There is also a Python API for advanced use cases. This API allows you build
        flexible abstractions such as dynamic pipelines, where the exact number of
        tasks is determined by its parameters.
        
        CHANGELOG
        =========
        
        0.5.2dev
        --------
        * Experimental PythonCallable.develop()
        
        0.5.1 (2020-06-30)
        ------------------
        * Reduces the number of required dependencies
        * A new option in DBAPIClient to split source with a custom separator
        
        
        0.5 (2020-06-27)
        ----------------
        * Adds CLI
        * New spec API to instantiate DAGs using YAML files
        * NotebookRunner.debug() for debugging and .develop() for interacive development
        * Bug fixes
        
        
        0.4.1 (2020-05-19)
        -------------------
        * PythonCallable.debug() now works in Jupyter notebooks
        
        0.4.0 (2020-05-18)
        -------------------
        * PythonCallable.debug() now uses IPython debugger by default
        * Improvements to Task.build() public API
        * Moves hook triggering logic to Task to simplify executors implementation
        * Adds DAGBuildEarlyStop exception to signal DAG execution stop
        * New option in Serial executor to turn warnings and exceptions capture off
        * Adds Product.prepare_metadata hook
        * Implements hot reload for notebooks and python callables
        * General clean ups for old `__str__` and `__repr__` in several modules
        * Refactored ploomber.sources module and ploomber.placeholders (previously ploomber.templates)
        * Adds NotebookRunner.debug() and NotebookRunner.develop()
        * NotebookRunner: now has an option to run static analysis on render
        * Adds documentation for DAG-level hooks
        * Bug fixes
        
        0.3.5 (2020-05-03)
        -------------------
        * Bug fixes #88, #89, #90, #84, #91
        * Modifies Env API: Env() is now Env.load(), Env.start() is now Env()
        * New advanced Env guide added to docs
        * Env can now be used with a context manager
        * Improved DAGConfigurator API
        * Deletes logger configuration in executors constructors, logging is available via DAGConfigurator
        
        
        0.3.4 (2020-04-25)
        -------------------
        * Dependencies cleanup
        * Removed (numpydoc) as dependency, now optional
        * A few bug fixes: #79, #71
        * All warnings are captured and shown at the end (Serial executor)
        * Moves differ parameter from DAG constructor to DAGConfigurator
        
        
        0.3.3 (2020-04-23)
        -------------------
        * Cleaned up some modules, deprecated some rarely used functionality
        * Improves documentation aimed to developers looking to extend ploomber
        * Introduces DAGConfigurator for advanced DAG configuration [Experimental API]
        * Adds task to upload files to S3 (ploomber.tasks.UploadToS3), requires boto3
        * Adds DAG-level on_finish and on_failure hooks
        * Support for enabling logging in entry points (via --logging)
        * Support for starting an interactive session using entry points (via python -i -m)
        * Improved support for database drivers that can only send one query at a time
        * Improved repr for SQLAlchemyClient, shows URI (but hides password)
        * PythonCallable now validates signature against params at render time
        * Bug fixes
        
        
        0.3.2 (2020-04-07)
        ------------------
        
        * Faster Product status checking, now performed at rendering time
        * New products: GenericProduct and GenericSQLRelation for Products that do not have a specific implementation (e.g. you can use Hive with the DBAPI client + GenericSQLRelation)
        * Improved DAG build reports, subselect columns, transform to pandas.DataFrame and dict
        * Parallel executor now returns build reports, just like the Serial executor
        
        
        
        0.3.1 (2020-04-01)
        ------------------
        
        * DAG parallel executor
        * Interact with pipelines from the command line (entry module)
        * Bug fixes
        * Refactored access to Product.metadata
        
        
        0.3 (2020-03-20)
        ----------------
        * New Quickstart and User Guide section in documentation
        * DAG rendering and build now continue until no more tasks can render/build (instead of failing at the first exception)
        * New @with_env and @load_env decorators for managing environments
        * Env expansion ({{user}} expands to the current, also {{git}} and {{version}} available)
        * Task.name is now optional when Task is initialized with a source that has __name__ attribute (Python functions) or a name attribute (like Placeholders returned from SourceLoader)
        * New Task.on_render hook
        * Bug fixes
        * A lot of new tests
        * Now compatible with Python 3.5 and higher
        
        0.2.1 (2020-02-20)
        ------------------
        
        * Adds integration with pdb via PythonCallable.debug
        * Env.start now accepts a filename to look for
        * Improvements to data_frame_validator
        
        0.2 (2020-02-13)
        ----------------
        
        * Simplifies installation
        * Deletes BashCommand, use ShellScript
        * More examples added
        * Refactored env module
        * Renames SQLStore to SourceLoader
        * Improvements to SQLStore
        * Improved documentation
        * Renamed PostgresCopy to PostgresCopyFrom
        * SQLUpload and PostgresCopy have now the same API
        * A few fixes to PostgresCopy (#1, #2)
        
        0.1
        ---
        
        * First release
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Provides-Extra: all
Provides-Extra: test
