Metadata-Version: 1.2
Name: badger_batcher
Version: 0.4.0
Summary: Badger Batcher contains useful utilities for batching a sequence on records
Home-page: https://github.com/tkasu/badger_batcher
Author: Tomi Kasurinen
Author-email: tomi.kasurinen@gmail.com
License: MIT license
Description: ==============
        Badger Batcher
        ==============
        
        
        .. image:: https://img.shields.io/pypi/v/badger_batcher.svg
                :target: https://pypi.python.org/pypi/badger_batcher
        
        .. image:: https://travis-ci.com/tkasu/badger_batcher.svg?branch=master
                :target: https://travis-ci.com/tkasu/badger_batcher
        
        .. image:: https://readthedocs.org/projects/badger-batcher/badge/?version=latest
                :target: https://badger-batcher.readthedocs.io/en/latest/?version=latest
                :alt: Documentation Status
        
        
        
        
        Badger Batcher contains useful utilities for batching a sequence on records
        
        
        * Free software: MIT license
        * Documentation: https://badger-batcher.readthedocs.io.
        
        
        Installation
        ------------
        
        .. code-block:: bash
        
            $ pip install badger_batcher
        
        
        Features
        --------
        
        Import Batcher:
        
        .. code-block:: python
        
            >>> from badger_batcher import Batcher
        
        
        Split records based max limit for batch len:
        
        .. code-block:: python
        
            >>> records = [f"record: {rec}" for rec in range(5)]
            >>> batcher = Batcher(records, max_batch_size=2)
            >>> batcher.batches()
            [['record: 0', 'record: 1'], ['record: 2', 'record: 3'], ['record: 4']]
        
        Split records with max limit for batch len and max limit for record size:
        
        .. code-block:: python
        
            >>> records = [b"aaaa", b"bb", b"ccccc", b"d"]
            >>> batcher = Batcher(
            ... records,
            ... max_batch_size=2,
            ... max_record_size=4,
            ... size_calc_fn=len,
            ... when_record_size_exceeded="skip",
            ... )
            >>> batcher.batches()
            [[b'aaaa', b'bb'], [b'd']]
        
        Split records with max batch len and size:
        
        .. code-block:: python
        
            >>> records = [b"a", b"a", b"a", b"b", b"ccc", b"toolargeforbatch", b"dd", b"e"]
            >>> batcher = Batcher(
            ... records,
            ... max_batch_len=3,
            ... max_batch_size=5,
            ... size_calc_fn=len,
            ... when_record_size_exceeded="skip",
            ... )
            >>> batcher.batches()
            [[b'a', b'a', b'a'], [b'b', b'ccc'], [b'dd', b'e']]
        
        When processing big chunks of data, consider iterating instead:
        
        .. code-block:: python
        
            >>> import sys
        
            >>> records = (f"record: {rec}" for rec in range(sys.maxsize))
            >>> batcher = Batcher(records, max_batch_size=2)
            >>> for batch in batcher:
            ...       # do something for each batch
            ...       some_fancy_fn(batch)
        
        If you need to encode records before applying the batcher, just encode it before applying.
        Batcher will not eagerly realize the whole iterable, so use a generator for bigger iterables.
        
        .. code-block:: python
        
            >>> records = ["a", "a", "a", "b", "ccc", "bbbb", "dd", "e"]
            >>> encoded_records_gen = (record.encode("utf-16-le") for record in records)
        
            >>> batcher = Batcher(
            ... encoded_records_gen,
            ... max_batch_len=3,
            ... max_record_size=6,
            ... max_batch_size=10,
            ... size_calc_fn=len,
            ... when_record_size_exceeded="skip",
            ... )
        
            >>> batched_records = batcher.batches()
            [
                [b"a\x00", b"a\x00", b"a\x00"],
                [b"b\x00", b"c\x00c\x00c\x00"],
                [b"d\x00d\x00", b"e\x00"],
            ]
        
        Full example for e.g. Kinesis Streams like processing
        
        .. code-block:: python
        
            import random
            from badger_batcher import Batcher
        
        
            def get_records():
                records = (
                    f"""{{'id': '{i}', 'body': {('x' * random.randint(100_000, 7_000_000))}}}"""
                    for i in range(10_000)
                )
                return records
        
        
            records = get_records()
            encoded_records = (record.encode("utf-8") for record in records)
        
            batcher = Batcher(
                encoded_records,
                max_batch_len=500,
                max_record_size=1000 * 1000,
                max_batch_size=5 * 1000 * 1000,
                size_calc_fn=len,
                when_record_size_exceeded="skip",
            )
        
            for i, batch in enumerate(batcher):
                # do something
        
        
        Credits
        -------
        
        This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
        
        .. _Cookiecutter: https://github.com/audreyr/cookiecutter
        .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
        
        
        =======
        History
        =======
        
        0.1.0 (2021-04-09)
        ------------------
        
        * First release on PyPI.
        
Keywords: badger_batcher
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.8
