Metadata-Version: 2.1
Name: zcollection
Version: 0.1
Summary: Zarr Collection
Home-page: https://github.com/CNES/zcollection
Author: CNES/CLS
Author-email: fbriol@gmail.com
License: BSD License
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
Provides-Extra: test
License-File: LICENSE

ZCollection
===========

This project is a Python library allowing manipulating data partitioned into a
**collection** of `Zarr <https://zarr.readthedocs.io/en/stable/>`_ groups.

This collection allows dividing a dataset into several partitions to facilitate
acquisitions or updates made from new products. Possible data partitioning is:
by **date** (hour, day, month, etc.) or by **sequence**.

A collection partitioned by date, with a monthly resolution, may look like on
the disk:

.. code-block:: ASCII

    collection/
    ├── year=2022
    │    ├── month=01/
    │    │    ├── time/
    │    │    │    ├── 0.0
    │    │    │    ├── .zarray
    │    │    │    └── .zattrs
    │    │    ├── var1/
    │    │    │    ├── 0.0
    │    │    │    ├── .zarray
    │    │    │    └── .zattrs
    │    │    ├── .zattrs
    │    │    ├── .zgroup
    │    │    └── .zmetadata
    │    └── month=02/
    │         ├── time/
    │         │    ├── 0.0
    │         │    ├── .zarray
    │         │    └── .zattrs
    │         ├── var1/
    │         │    ├── 0.0
    │         │    ├── .zarray
    │         │    └── .zattrs
    │         ├── .zattrs
    │         ├── .zgroup
    │         └── .zmetadata
    └── .zcollection

Partition updates can be set to overwrite existing data with new ones or to
update them using different **strategies**.

The `Dask library <https://dask.org/>`_ handles the data to scale the treatments
quickly.

It is possible to create views on a reference collection, to add and modify
variables contained in a reference collection, accessible in reading only.

This library can store data on POSIX, S3, or any other file system supported by
the Python library `fsspec
<https://filesystem-spec.readthedocs.io/en/latest/>`_. Note, however, only POSIX
and S3 file systems have been tested.


