Metadata-Version: 2.1
Name: bizon
Version: 0.0.1
Summary: Extract and load your data reliably from API Clients with native fault-tolerant and checkpointing mechanism.
Author: Antoine Balliet
Author-email: antoine.balliet@gmail.com
Requires-Python: >=3.9,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: backoff (>=2.2.1,<3.0.0)
Requires-Dist: dpath (>=2.2.0,<3.0.0)
Requires-Dist: faker (>=26.0.0,<27.0.0)
Requires-Dist: google-cloud-bigquery (>=3.25.0,<4.0.0)
Requires-Dist: google-cloud-storage (>=2.17.0,<3.0.0)
Requires-Dist: kafka-python (>=2.0.2,<3.0.0)
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: pendulum (>=3.0.0,<4.0.0)
Requires-Dist: pyarrow (>=16.1.0,<17.0.0)
Requires-Dist: pydantic (>=2.8.2,<3.0.0)
Requires-Dist: pydantic-extra-types (>=2.9.0,<3.0.0)
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Requires-Dist: requests (>=2.28.2,<3.0.0)
Requires-Dist: sqlalchemy (>=2.0.32,<3.0.0)
Requires-Dist: sqlalchemy-bigquery (>=1.11.0,<2.0.0)
Description-Content-Type: text/markdown

# bizon ⚡️
Extract and load your largest data streams with a framework you can trust for billion records.

## Features
- **Natively fault-tolerant**: Bizon uses a checkpointing mechanism to keep track of the progress and recover from the last checkpoint.
- **Queue system agnostic**: Bizon is agnostic of the queuing system, you can use any queuing system like Python Queue, Kafka or Redpanda. Thanks to the `bizon.queue.Queue` interface, adapters can be written for any queuing system.
- **Pipeline metrics**: Bizon provides exhaustive pipeline metrics and implement OpenTelemetry for tracing. You can monitor:
    - ETAs for completion
    - Number of records processed
    - Completion percentage
    - Latency Source <> Destination
- **Lightweight & lean**: Bizon is lightweight, minimal codebase and only uses few dependencies:
    - `requests` for HTTP requests
    - `pyyaml` for configuration
    - `sqlalchemy` for database / warehouse connections
    - `pyarrow` for Parquet file format

## Installation
```bash
pip install bizon
```

## Usage
```python
from yaml import safe_load
from bizon.engine.runner import RunnerFactory

yaml_config = """
source:
  source_name: dummy
  stream_name: creatures
  authentication:
    type: api_key
    params:
      token: dummy_key

destination:
  name: logger
  config:
    dummy: dummy
"""

config = safe_load(yaml_config)
runner = RunnerFactory.create_from_config_dict(config=config)
runner.run()
```

## Backend configuration

Backend is the interface used by Bizon to store its state. It can be configured in the `backend` section of the configuration file. The following backends are supported:
- `sqlite`: In-memory SQLite database, useful for testing and development.
- `biguquery`: Google BigQuery backend, perfect for light setup & production.
- `postgres`: PostgreSQL backend, for production use and frequent cursor updates.

## Queue configuration

Queue is the interface used by Bizon to exchange data between `Source` and `Destination`. It can be configured in the `queue` section of the configuration file. The following queues are supported:
- `python_queue`: Python Queue, useful for testing and development.
- `kafka`: Apache Kafka, for production use and high throughput.

## Start syncing your data 🚀

### Quick setup without any dependencies ✌️

Queue configuration can be set to `python_queue` and backend configuration to `sqlite`.
This will allow you to test the pipeline without any external dependencies.


### Local Kafka setup

To test the pipeline with Kafka, you can use `docker compose` to setup Kafka or Redpanda locally.

**Kafka**
```bash
docker compose --file ./scripts/kafka-compose.yml up
```

In your YAML configuration, set the `queue` configuration to Kafka under `engine`:
```yaml
engine:
  queue:
    type: kafka
    config:
      bootstrap_servers: localhost:9092
```

**Redpanda**
```bash
docker compose --file ./scripts/redpanda-compose.yml up
```

In your YAML configuration, set the `queue` configuration to Kafka under `engine`:

```yaml
engine:
  queue:
    type: kafka
    config:
      bootstrap_servers: localhost:19092
```

