Metadata-Version: 2.1
Name: data-steps
Version: 0.3.1
Summary: Simple tool for pandas data transformation
Home-page: https://github.com/KonradUdoHannes/data-steps
Author: Konrad Wölms
Author-email: konrad.woelms@gmail.com
License: MIT
Description: # data-steps
        
        This projects provides a minmal framework to
        organize data transformations in pandas.
        
        It is intended to be used in both notebooks
        and code files.
        
        The main idea is to provide a simple decorator
        syntax that is easy to maintains when data
        transfromation steps get changed or added
        throughout the project. A prime example
        is data cleaning where only later in the project
        some required cleaning steps become apparent.
        
        ## Features
        
        After wrapping a pandas DataFrame in a `DataSteps`
        class. The following features are available.
        
        - register data transformations with the instances `.step`
            decorator
        - get an overview of the registered steps with `.steps`
        - inspect the original data the fully transformed data
            and any partially transformed data in between
        - change parameters of registered steps
        - interactively redefine or deactivate steps in jupyter notebooks
        - register steps that return secondary results, i.e. the main result is passed alon
            the pipeline, whereas the secondary result is stored seperately
        - convert data steps pipelines to strings that can more easily be integrated into a non-eda code-base
        
        ## Usage Example
        
        Wrap your data in an instance
        
        ```python
        from data_steps import DataSteps
        
        data = DataSteps(my_pandas_df)
        
        #register transformation steps
        
        @data.step
        def data_transformation(df):
            #transfromation steps
            ...
            return transformed_df
        
        @data.step
        def transform_with_parameters(df,param1,param2=4):
            #transfromation steps
            ...
            return transformed_df
        
        #access original data
        data.original
        
        #set or update transformation parameters
        data.update_step_kwargs('transform_with_parameters',{'param1':10})
        
        #access data after all transformation steps
        data.transformed
        
        
        #get an overview of the registered steps
        data.steps
        
        #only execute some steps to help debugging transformations
        data.partial_transform(0)
        ```
        
        
        # History
        
        ## 0.0.1 (2021-01-31)
        
        - First release on PyPi.
        
        ## 0.1.0 (2021-02-11)
        
        - Changed step decorator to work in bare format,
          i.e. `<instance>.step` instead of `<instance>.step()`
        
        ## 0.2.0 (2021-05-02)
        
        - support for additional arguments in steps
        
        ## 0.3.0 (2021-05-30)
        
        - support for exporting a datasteps pipeline as a string
        - Enable steps to contain side results next to the transformed data. These could be summaries for diagnostics or plots for an intermediate result
        
        ## Possible extensions
        
        No Concrete plans at the moment but feel free to open enhancement issues on github
        
Platform: UNKNOWN
Requires-Python: >=3.8
Description-Content-Type: text/markdown
