Metadata-Version: 2.1
Name: convtools
Version: 0.36.0
Summary: convtools allows to define and reuse conversions for processing collections and csv tables, complex aggregations and joins.
Home-page: UNKNOWN
Author: Nikita Almakov
Author-email: nikita.almakov@gmail.com
License: mit
Project-URL: Documentation, https://convtools.readthedocs.io/en/latest/
Project-URL: Source, https://github.com/westandskif/convtools
Project-URL: Bug Reports, https://github.com/westandskif/convtools/issues
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.6
Description-Content-Type: text/x-rst; charset=UTF-8
Provides-Extra: testing
License-File: LICENSE.txt
License-File: AUTHORS.rst

=========
convtools
=========

**convtools** is a python library to declaratively define data transforms:

* ``convtools.conversion`` - pipelines for processing collections, doing
  complex aggregations and joins.
* ``convtools.contrib.tables`` - stream processing of table-like data (e.g.
  CSV)

.. image:: https://img.shields.io/pypi/pyversions/convtools.svg
    :target: https://pypi.org/project/convtools/

.. image:: https://img.shields.io/github/license/westandskif/convtools.svg
   :target: https://github.com/westandskif/convtools/blob/master/LICENSE.txt

.. image:: https://codecov.io/gh/westandskif/convtools/branch/master/graph/badge.svg
   :target: https://codecov.io/gh/westandskif/convtools

.. image:: https://github.com/westandskif/convtools/workflows/tests/badge.svg
   :target: https://github.com/westandskif/convtools/workflows/tests/badge.svg
   :alt: Tests Status

.. image:: https://readthedocs.org/projects/convtools/badge/?version=latest
   :target: https://convtools.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

.. image:: https://img.shields.io/github/tag/westandskif/convtools.svg
   :target: https://GitHub.com/westandskif/convtools/tags/

.. image:: https://badge.fury.io/py/convtools.svg
   :target: https://badge.fury.io/py/convtools

.. image:: https://img.shields.io/twitter/url?label=convtools&style=social&url=https%3A%2F%2Ftwitter.com%2Fconvtools
   :target: https://twitter.com/convtools
   :alt: Twitter URL

Docs
====

* `convtools on Read the Docs <https://convtools.readthedocs.io/en/latest/>`_
* `Cheatsheet <https://convtools.readthedocs.io/en/latest/cheatsheet.html>`_
* `QuickStart <https://convtools.readthedocs.io/en/latest/quick_start.html>`_
* `Table - Stream processing <https://convtools.readthedocs.io/en/latest/tables.html>`_

Why would you need this?
========================

* you prefer declarative approach
* you love functional programming
* you believe that Python is high-level enough not to make you write
  aggregations and joins by hand
* you need to serialize/validate objects
* you need to dynamically define transforms (including at runtime)
* you like the idea of having something write ad hoc code for you :)


Installation:
=============

.. code-block:: bash

   pip install convtools


Conversions - data transforms, aggregations, joins
==================================================

.. code-block:: python

   # pip install convtools

   from convtools import conversion as c

   input_data = [{"StoreID": " 123", "Quantity": "123"}]

   # define a conversion (sometimes you may want to do this dynamically)
   #  takes iterable and returns iterable of dicts, stopping before the first
   #  one with quantity >= 1000, splitting into chunks of size = 1000
   conversion = (
       c.iter(
           {
               "id": c.item("StoreID").call_method("strip"),
               "quantity": c.item("Quantity").as_type(int),
           }
       )
       .take_while(c.item("quantity") < 1000)
       .pipe(
           c.chunk_by(c.item("id"), size=1000)
       )
       .as_type(list)
       .gen_converter(debug=True)
   )

   # compile the conversion into an ad hoc function and run it
   converter = conversion.gen_converter()
   converter(input_data)

   # OR in case of a one-shot use
   conversion.execute(input_data)

.. code-block:: python

    from convtools import conversion as c


    def test_doc__index_intro():

        # ======== #
        # GROUP BY #
        # ======== #
        input_data = [
            {"a": 5, "b": "foo"},
            {"a": 10, "b": "foo"},
            {"a": 10, "b": "bar"},
            {"a": 10, "b": "bar"},
            {"a": 20, "b": "bar"},
        ]

        conv = (
            c.group_by(c.item("b"))
            .aggregate(
                {
                    "b": c.item("b"),
                    "a_first": c.ReduceFuncs.First(c.item("a")),
                    "a_max": c.ReduceFuncs.Max(c.item("a")),
                }
            )
            .gen_converter(debug=True)
        )

        assert conv(input_data) == [
            {"b": "foo", "a_first": 5, "a_max": 10},
            {"b": "bar", "a_first": 10, "a_max": 20},
        ]

        # ========= #
        # AGGREGATE #
        # ========= #
        conv = c.aggregate(
            {
                # list of "a" values where "b" equals to "bar"
                "a": c.ReduceFuncs.Array(c.item("a"), where=c.item("b") == "bar"),
                # "b" value of a row where "a" has Max value
                "b": c.ReduceFuncs.MaxRow(
                    c.item("a"),
                ).item("b", default=None),
            }
        ).gen_converter(debug=True)

        assert conv(input_data) == {"a": [10, 10, 20], "b": "bar"}

        # ==== #
        # JOIN #
        # ==== #
        collection_1 = [
            {"id": 1, "name": "Nick"},
            {"id": 2, "name": "Joash"},
            {"id": 3, "name": "Bob"},
        ]
        collection_2 = [
            {"ID": "3", "age": 17, "country": "GB"},
            {"ID": "2", "age": 21, "country": "US"},
            {"ID": "1", "age": 18, "country": "CA"},
        ]
        input_data = (collection_1, collection_2)

        conv = (
            c.join(
                c.item(0),
                c.item(1),
                c.and_(
                    c.LEFT.item("id") == c.RIGHT.item("ID").as_type(int),
                    c.RIGHT.item("age") >= 18,
                ),
                how="left",
            )
            .pipe(
                c.list_comp(
                    {
                        "id": c.item(0, "id"),
                        "name": c.item(0, "name"),
                        "age": c.item(1, "age", default=None),
                        "country": c.item(1, "country", default=None),
                    }
                )
            )
            .gen_converter(debug=True)
        )

        assert conv(input_data) == [
            {"id": 1, "name": "Nick", "age": 18, "country": "CA"},
            {"id": 2, "name": "Joash", "age": 21, "country": "US"},
            {"id": 3, "name": "Bob", "age": None, "country": None},
        ]

What reducers are supported by aggregations?
++++++++++++++++++++++++++++++++++++++++++++

Built-in ones, exposed like ``c.ReduceFuncs.Sum``:
 * Sum
 * SumOrNone
 * Max
 * MaxRow
 * Min
 * MinRow
 * Count
 * CountDistinct
 * First
 * Last
 * Average
 * Median
 * Percentile - ``c.ReduceFuncs.Percentile(95.0, c.item("x"))``
 * Mode
 * TopK - ``c.ReduceFuncs.TopK(3, c.item("x"))``
 * Array
 * ArrayDistinct
 * ArraySorted - ``c.ReduceFuncs.ArraySorted(c.item("x"), key=lambda v: v, reverse=True)``
 * Dict - ``c.ReduceFuncs.Dict(c.item("key"), c.item("x"))``
 * DictArray
 * DictSum
 * DictSumOrNone
 * DictMax
 * DictMin
 * DictCount
 * DictCountDistinct
 * DictFirst
 * DictLast

and any reduce function of two arguments you pass in ``c.reduce``.


Contrib / Table - stream processing of table-like data
======================================================

``Table`` helper allows to massage CSVs and table-like data:
 * join / zip / chain tables
 * take / drop / rename columns
 * filter rows
 * update / update_all values

.. code-block:: python

   from convtools.contrib.tables import Table
   from convtools import conversion as c

   # reads Iterable of rows
   (
       Table.from_rows([(0, -1), (1, 2)], header=["a", "b"]).join(
           Table
           # reads tab-separated CSV file
           .from_csv(
               "tests/csvs/ac.csv",
               header=True,
               dialect=Table.csv_dialect(delimiter="\t"),
           )
           # transform column values
           .update(
               a=c.col("a").as_type(float),
               c=c.col("c").as_type(int),
           )
           # filter rows by condition
           .filter(c.col("c") >= 0),
           # joins on column "a" values
           on=["a"],
           how="inner",
       )
       # rearrange columns
       .take(..., "a")
       # this is a generator to consume (tuple, list are supported too)
       .into_iter_rows(dict)
   )


Is it any different from tools like Pandas / Polars?
====================================================

* convtools doesn't wrap data in any container, it just writes and runs the
  code which perform the conversion you defined
* convtools is a lightweight library with no dependencies `(however optional`
  ``black`` `is highly recommended for pretty-printing generated code when
  debugging)`
* convtools is about defining and reusing conversions -- declarative
  approach, while wrapping data in high-performance containers is more of being
  imperative
* convtools supports nested aggregations


Is this thing debuggable?
=========================

Despite being compiled at runtime, it is (by both ``pdb`` and ``pydevd``).

Docs
====

* `convtools on Read the Docs <https://convtools.readthedocs.io/en/latest/>`_
* `Cheatsheet <https://convtools.readthedocs.io/en/latest/cheatsheet.html>`_
* `QuickStart <https://convtools.readthedocs.io/en/latest/quick_start.html>`_
* `Table - Stream processing <https://convtools.readthedocs.io/en/latest/tables.html>`_


