Metadata-Version: 2.1
Name: protarrow
Version: 0.0.1rc6
Summary: Convert from protobuf to arrow and back
Home-page: https://github.com/tradewelltech/protarrow
License: Apache-2.0
Author: 0x26res
Author-email: 0x26res@gmail.com
Requires-Python: >=3.8,<3.11
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: googleapis-common-protos (>=1.53.0,<2.0.0)
Requires-Dist: protobuf (>=3.20.1,<4.0.0)
Requires-Dist: pyarrow (>=8.0.0,<9.0.0)
Project-URL: Repository, https://github.com/tradewelltech/protarrow
Description-Content-Type: text/markdown



[![PyPI Version][pypi-image]][pypi-url]
[![][versions-image]][versions-url]
[![][stars-image]][stars-url]
[![codecov](https://codecov.io/gh/0x26res/protarrow/branch/master/graph/badge.svg?token=XMFH27IL70)](https://codecov.io/gh/0x26res/protarrow)
[![Build Status][build-image]][build-url]



# Protarrow

A library for converting from protobuf to arrow and back 

# Installation

```shell
pip install protarrow
```

# Usage

## Convert from proto to arrow

```protobuf
message MyProto {
  string name = 1;
  int32 id = 2;
  repeated int32 values = 3;
}
```

```python
import protarrow

my_protos = [
    MyProto(name="foo", id=1, values=[1, 2, 4]),
    MyProto(name="bar", id=2, values=[3, 4, 5]),
]

schema = protarrow.message_type_to_schema(MyProto)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto)
table = protarrow.messages_to_table(my_protos, MyProto)
```
| name   |   id | values   |
|:-------|-----:|:---------|
| foo    |    1 | [1 2 4]  |
| bar    |    2 | [3 4 5]  |


## Convert from arrow to proto

```python
protos_from_record_batch = protarrow.table_to_messages(record_batch, MyProto)
protos_from_table = protarrow.table_to_messages(table, MyProto)
```

## Customize arrow type

The arrow type for `Enum`, `Timestamp` and `TimeOfDay` can be configured:

```python
config = protarrow.ProtarrowConfig(
    enum_type=pa.int32(),
    timestamp_type=pa.timestamp("ms", "America/New_York"),
    time_of_day_type=pa.time32("ms"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)
```

# Type Mapping

## Native Types

| Proto    | Pyarrow                 | Note         |
|----------|-------------------------|--------------|
| bool     | bool_                   |              |
| bytes    | binary                  |              |
| double   | float64                 |              |
| enum     | **int32**/string/binary | configurable |
| fixed32  | int32                   |              |
| fixed64  | int64                   |              |
| float    | float32                 |              |
| int32    | int32                   |              |
| int64    | int64                   |              |
| message  | struct                  |              |
| sfixed32 | int32                   |              |
| sfixed64 | int64                   |              |
| sint32   | int32                   |              |
| sint64   | int64                   |              |
| string   | string                  |              |
| uint32   | uint32                  |              |
| uint64   | uint64                  |              |

## Other types


| Proto                       | Pyarrow                | Note                               |
|-----------------------------|------------------------|------------------------------------|
| repeated                    | list_                  |                                    |
| map                         | map_                   |                                    |
| google.protobuf.BoolValue   | bool_                  |                                    |
| google.protobuf.BytesValue  | binary                 |                                    |
| google.protobuf.DoubleValue | float64                |                                    |
| google.protobuf.FloatValue  | float32                |                                    |
| google.protobuf.Int32Value  | int32                  |                                    |
| google.protobuf.Int64Value  | int64                  |                                    |
| google.protobuf.StringValue | string                 |                                    |
| google.protobuf.Timestamp   | timestamp("ns", "UTC") | Unit and timezone are configurable |
| google.protobuf.UInt32Value | uint32                 |                                    |
| google.protobuf.UInt64Value | uint64                 |                                    |
| google.type.Date            | date32()               |                                    |
| google.type.TimeOfDay       | **time64**/time32      | Unit and type are configurable     |

## Nullability

* Top level native field, list and maps are marked as non-nullable.
* Any nested message and their children are nullable


<!-- Badges: -->

[pypi-image]: https://img.shields.io/pypi/v/protarrow
[pypi-url]: https://pypi.org/project/protarrow/
[build-image]: https://github.com/tradewelltech/protarrow/actions/workflows/build.yaml/badge.svg
[build-url]: https://github.com/tradewelltech/protarrow/actions/workflows/build.yaml
[stars-image]: https://img.shields.io/github/stars/tradewelltech/protarrow
[stars-url]: https://github.com/tradewelltech/protarrow
[versions-image]: https://img.shields.io/pypi/pyversions/protarrow
[versions-url]: https://pypi.org/project/protarrow/

