Metadata-Version: 2.1
Name: zarr-eosdis-store
Version: 0.0.2
Summary: Zarr Store class for working with EOSDIS cloud data
Home-page: https://github.com/nasa/zarr-eosdis-store
Author: Patrick Quinn, Matthew Hanson
Author-email: patrick@patrickquinn.net
License: UNKNOWN
Description: zarr-eosdis-store
        =================
        
        The zarr-eosdis-store library allows NASA EOSDIS Collections to be accessed efficiently
        by the `Zarr Python library <https://zarr.readthedocs.io/en/stable/index.html>`_, provided they
        have a sidecar DMR++ metadata file generated.
        
        Installation
        ============
        
        This module requires Python 3.8 or greater::
        
            $ python --version
            Python 3.8.2
        
        Install from PyPI::
        
            $ pip install zarr-eosdis-store
        
        To install the latest development version::
        
            $ pip install pip install git+https://github.com/nasa/zarr-eosdis-store.git@main#egg=zarr-eosdis-store
        
        Earthdata Login
        ===============
        
        To access EOSDIS data, you need to sign in with a free NASA Earthdata Login account, which you can obtain at
        `<https://urs.earthdata.nasa.gov/>`_.
        
        Once you have an account, you will need to add your credentials to your ``~/.netrc`` file::
        
            machine urs.earthdata.nasa.gov login YOUR_USERNAME password YOUR_PASSWORD
        
        If you are accessing test data, you will need to use an account from the Earthdata Login test system at
        `<https://uat.urs.earthdata.nasa.gov/>`_ instead, adding a corresponding line to your ``~/.netrc`` file::
        
            machine uat.urs.earthdata.nasa.gov login YOUR_USERNAME password YOUR_PASSWORD
        
        
        Usage
        =====
        
        To use the library, simply instantiate ``eosdis_store.EosdisStore`` with the URL to the data file you would
        like to access, pass it to the Zarr library as you would with any other store, and use the Zarr API as with any
        other read-only Zarr file.  Note: the URL to the data file will typically end with an HDF5 or NetCDF4 extension,
        not .zarr.
        
        .. code-block:: python
        
           from eosdis_store import EosdisStore
           import zarr
        
           # Assumes you have set up .netrc with your Earthdata Login information
           f = zarr.open(EosdisStore('https://example.com/your/data/file.nc4'))
        
           # Read metadata and data from f using the Zarr API
           print(f['parameter_name'][0:0:0])
        
        If the data has _FillValue (to flag nodata), scale_factor, or add_offset set (defined in metadata using CF-conventions)
        they can be retrieved from the parameter attributes.
        
        .. code-block:: python
        
          import numpy as np
        
          scale_factor = f['parameter_name].scale_factor
          add_offset = f['parameter_name].add_offset
          nodata = f['parameter_name]._FillValue
        
          arr = f['parameter_name'][] * scale_factor + add_offset
        
          nodata_locs = np.where(arr == nodata)
        
        
        A better way to handle these is to use XArray. Rather than reading the data immediately when a slice is requested, XArray
        defers the read until the data is actually accessed. With the Zarr backend to XArray, the scale and offset can be set so that
        when the data is accessed it will apply those values. This is more efficient if the data is going to be used in other operations.
        
        The scale_factor and get_offset will be used if specified in the NetCDF/HDF5 file.
        
        .. code-block:: python
        
          import xarray
        
          store = EosdisStore('https://example.com/your/data/file.nc4')
        
          f = xarray.open_zarr(store)
        
          # the data is not read yet
          xa = f['parameter_name'][<slice>]
        
          # convert to numpy array, data is read
          arr = xa.values
        
        The resulting array will have had scale and offset applied, and any element that is equal to the _FillValue attribute will be
        set to numpy `nan`. To use XArray without apply the scale and offset or setting the nodata to `nan`, supply the `mask_and_scale`
        keyword to xarray.open_zarr to False:
        
        .. code-block:: python
        
          store = EosdisStore('https://example.com/your/data/file.nc4')
        
          f = xarray.open_zarr(store, mask_and_scale=False)
        
        
        Technical Summary
        =================
        
        We make use of a technique to read NetCDF4 and some HDF5 files that was prototyped by The HDF Group and USGS, described
        `here <https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314)>`_.
        
        To allow the technique to work with EOSDIS data, we have extended it and optimized access in the following key ways:
        
        * The ``EosdisStore`` reads a DMR++ file generated by OPeNDAP to present its metadata and determine byte offsets to the
          Zarr library. By reusing these, we avoid needing to generate new metadata sidecar files to support new data.
        
        * The store uses HTTPS and authenticates with a ``.netrc`` entry, rather than the S3 API, making it compatible with
          EOSDIS access patterns and requirements
        
        * The store caches redirect URLs for a period of time set by the Cache-Control header.  Doing this avoids the overhead
          of repeated redirects when accessing parts of files.
        
        * In addition to backward-compatible APIs, the store exposes a proposed API that allows it to make more efficient access
          decisions. The ticket describing the API is available here: `<https://github.com/zarr-developers/zarr-python/issues/536>`_.
          The store works without this implementation but is significantly faster with it, making the following optimizations:
        
          * When the Zarr library accesses data that requires reading multiple near-sequential bytes in the file, the store combines
            these smaller requests into a single larger request.
        
          * After an initial request to cache any authentication and redirect information, the store runs subsequent requests in
            parallel.
        
        Development
        ===========
        
        Clone the repository, then ``pip install`` its dependencies::
        
            pip install -r requirements.txt
            pip install -r requirements-dev.txt
        
        To check code coverage and run tests::
        
            coverage run -m pytest
        
        To check coding style::
        
            flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        
        To build documentation, generated at ``docs/_build/html/index.html``::
        
            cd docs && make html
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: dev
