Metadata-Version: 2.1
Name: protarrow
Version: 0.0.1rc5
Summary: Convert from protobuf to arrow and back
Home-page: https://github.com/tradewelltech/protarrow
License: Apache-2.0
Author: 0x26res
Author-email: 0x26res@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: googleapis-common-protos (>=1.53.0,<2.0.0)
Requires-Dist: protobuf (>=3.20.1,<4.0.0)
Requires-Dist: pyarrow (>=8.0.0,<9.0.0)
Project-URL: Repository, https://github.com/tradewelltech/protarrow
Description-Content-Type: text/markdown


[![codecov](https://codecov.io/gh/0x26res/protarrow/branch/master/graph/badge.svg?token=XMFH27IL70)](https://codecov.io/gh/0x26res/protarrow)

# Protarrow

A library for converting from protobuf to arrow and back 

# Installation

```shell
pip install protarrow
```

# Usage

## Convert from proto to arrow

```protobuf
message MyProto {
  string name = 1;
  repeated int32 values = 2;
}
```

```python
import protarrow

my_protos = [
    MyProto(name="foo", values=[1, 2, 4]),
    MyProto(name="bar", values=[1, 2, 4]),
]

schema = protarrow.message_type_to_schema(MyProto)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto)
table = protarrow.messages_to_table(my_protos, MyProto)
```

| name   | values   |
|:-------|:---------|
| foo    | [1 2 4]  |
| bar    | [3 4 5]  |


## Convert from arrow to proto

```python
protos_from_record_batch = protarrow.table_to_messages(record_batch, MyProto)
protos_from_table = protarrow.table_to_messages(table, MyProto)
```

## Customize arrow type

The arrow type for `Enum`, `Timestamp` and `TimeOfDay` can be configured:

```python
config = protarrow.ProtarrowConfig(
    enum_type=pa.int32(),
    timestamp_type=pa.timestamp("ms", "America/New_York"),
    time_of_day_type=pa.time32("ms"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)
```

# Type Mapping

## Native Types

| Proto    | Pyarrow                 | Note         |
|----------|-------------------------|--------------|
| bool     | bool_                   |              |
| bytes    | binary                  |              |
| double   | float64                 |              |
| enum     | **int32**/string/binary | configurable |
| fixed32  | int32                   |              |
| fixed64  | int64                   |              |
| float    | float32                 |              |
| int32    | int32                   |              |
| int64    | int64                   |              |
| message  | struct                  |              |
| sfixed32 | int32                   |              |
| sfixed64 | int64                   |              |
| sint32   | int32                   |              |
| sint64   | int64                   |              |
| string   | string                  |              |
| uint32   | uint32                  |              |
| uint64   | uint64                  |              |

## Other types


| Proto                       | Pyarrow                | Note                               |
|-----------------------------|------------------------|------------------------------------|
| repeated                    | list_                  |                                    |
| map                         | map_                   |                                    |
| google.protobuf.BoolValue   | bool_                  |                                    |
| google.protobuf.BytesValue  | binary                 |                                    |
| google.protobuf.DoubleValue | float64                |                                    |
| google.protobuf.FloatValue  | float32                |                                    |
| google.protobuf.Int32Value  | int32                  |                                    |
| google.protobuf.Int64Value  | int64                  |                                    |
| google.protobuf.StringValue | string                 |                                    |
| google.protobuf.Timestamp   | timestamp("ns", "UTC") | Unit and timezone are configurable |
| google.protobuf.UInt32Value | uint32                 |                                    |
| google.protobuf.UInt64Value | uint64                 |                                    |
| google.type.Date            | date32()               |                                    |
| google.type.TimeOfDay       | **time64**/time32      | Unit and type are configurable     |

## Nullability

* Top level native field, list and maps are marked as non-nullable.
* Any nested message and their children are nullable

# Development

## Set up

```shell
python3 -m venv --clear venv
source venv/bin/activate
poetry self add "poetry-dynamic-versioning[plugin]"
poetry install
python ./scripts/protoc.py
pre-commit install
```

## Testing

This library relies on property based testing. 
Tests convert randomly generated data from protobuf to arrow and back, making sure the end result is the same as the input.

```shell
coverage run --branch --include "*/protarrow/*" -m pytest tests
coverage report
```

