Using echopype
==============


Installation
------------

Echopype can be installed from PyPI:

.. code-block:: console

   $ pip install echopype


or through conda:

.. code-block:: console

   $ conda install -c conda-forge echopype


When creating an conda environment to work with echopype,
use the supplied ``environment.yml`` or do

.. code-block:: console

   $ conda create -c conda-forge -n echopype python=3.8 --file requirements.txt


.. note::  Echopype currently uses python 3.8 due to an
   `issue <https://github.com/OSOceanAcoustics/echopype/issues/83>`_
   with numcodecs wheels.



File conversion
---------------

Supported file types
~~~~~~~~~~~~~~~~~~~~

Echopype currently supports conversion from

- ``.raw`` files generated by Simrad's EK60 echosounder
- ``.01A`` files generated by ASL Environmental Sciences' AZFP echosounder

into netCDF (stable) or zarr (beta) files.

In the `ek80 <https://github.com/OSOceanAcoustics/echopype/tree/ek80>`_ development branch
we are actively developing file conversion and processing routines
such as pulse compression and calibration for the broadband Simrad EK80 ``.raw`` file.

We are considering implementing calibration routines for
*raw beam* data from common-found Acoustic Doppler Current Profilers (ADCPs).


Conversion operation
~~~~~~~~~~~~~~~~~~~~

File conversion for different types of echosounders is achieved by
using a single interface through the ``Convert`` module.

For data files from the EK60 echosounder, you can do
the following in an interactive Python session:

.. code-block:: python

    from echopype.convert import Convert
    dc = Convert('FILENAME.raw')
    dc.raw2nc()

This will generate a  ``FILENAME.nc`` file in the same directory as
the original ``FILENAME.raw`` file.

For data files from the AZFP echosounder, the conversion requires an
extra ``.XML`` file along with the ``.01A`` data file. The ``.XML`` file
contains a lot of metadata needed for unpacking the binary data files.
Typically one single ``.XML`` file is associated with all files from the
same deployment.

This can be done by:

.. code-block:: python

    from echopype.convert import Convert
    dc = Convert('FILENAME.01A', 'XMLFILENAME.xml')
    dc.raw2nc()

Before calling ``raw2nc()`` to create netCDF4 files,
you should first set ``platform_name``, ``platform_type``, and
``patform_code_ICES``, as these values are not recorded in the raw data
files but need to be specified according to the netCDF4 convention.
These parameters will be saved as empty strings unless you specify
them following the example below:

.. code-block:: python

    dc.platform_name = 'OOI'
    dc.platform_type = 'subsurface mooring'
    dc.platform_code_ICES = '3164'   # Platform code for Moorings

The ``platform_code_ICES`` attribute can be chosen by referencing
the platform code from the
`ICES SHIPC vocabulary <https://vocab.ices.dk/?ref=315>`_.

.. note::

   1. For conversion to zarr files, call method ``.raw2zarr()`` from
      the same ``Convert`` object as shown above.

   2. The ``Convert`` instance contains all the data unpacked from the
      raw file, so it is a good idea to clear it from memory once done with
      conversion.


More conversion options
~~~~~~~~~~~~~~~~~~~~~~~

There are optional arguments that you can pass into ``Convert.raw2nc()``
that may come in handy.

- Save converted files into another folder:

  By default the converted ``.nc`` files are saved into the same folder as
  the input files. This can be changed by setting ``save_path`` to path to
  a directory.

  .. code-block:: python

     raw_file_path = ['./raw_data_files/file_01.raw',   # a list of raw data files
                      './raw_data_files/file_02.raw',
                      ...]
     dc = Convert(raw_file_path)                        # create a Convert object
     dc.raw2nc(save_path='./unpacked_files')            # set the output directory

  Each input file will be converted to individual ``.nc`` files and
  stored in the specified directory.

- Combine multiple raw data files into one ``.nc`` file when unpacking:

  .. code-block:: python

     raw_file_path = ['./raw_data_files/file_01.raw',   # a list of raw data files
                      './raw_data_files/file_02.raw',
                      ...]
     dc = Convert(raw_file_path)                        # create a Convert object
     dc.raw2nc(combine_opt=True,                        # combine all input files when unpacking
               save_path='./unpacked_files/combined_file.nc')

  ``save_path`` has to be given explicitly when combining multiple files.
  If ``save_path`` is only a filename instead of a full path,
  the combined output file will be saved in the same folder as the raw data files.


Non-uniform data
~~~~~~~~~~~~~~~~

Due to flexibility in echosounder settings, some dimensional parameters can
change in the middle of the file. For example:

- The maximum depth range to which data are collected can change in the middle
  of a data file in EK60. This happens often when the bottom depth changes.
- The sampling interval, which translates to temporal resolution, and thus range
  resolution, can also change in the middle of the file.
- Data from different frequency channels can also be collected with
  different sampling intervals.

These changes produce different number of samples along range (the ``range_bin``
dimension in the converted ``.nc`` file), which are incompatible with the goal
to save the data as a multi-dimensional array that can be easily indexed using xarray.

Echopype accommodates these cases in the following two ways:

1. When there are changes in the ``range_bin`` dimension in the middle of
   a data file, echopype creates separate files for each consecutive chunk of
   data with the same number of samples along range and append ``_partXX`` to
   the converted filename to indicate the existence of such changes.
   For example, if ``datafile.raw`` contains changes in the number of
   samples along range, the converted output will be ``datafile_part01.nc``,
   ``datafile_part02.nc``, etc.

2. When the number of samples along the ``range_bin`` dimensions are different
   for different frequency channels, echopype pads the shorter channels with
   ``NaN`` to form a multi-dimensional array. We use the data compression option
   in ``xarray.to_netcdf()`` and ``xarray.to_zarr()`` to avoid dramatically
   increasing the output file size due to padding.


..
   Command line tools
   ~~~~~~~~~~~~~~~~~~

   Echopype also supports batch conversion of binary data files to netCDF
   files (``.nc``) in the terminal. As with before, an ``.XML`` file is
   needed to convert the data files from AZFP echosounder.

   For converting ``.raw`` files from EK60:

   .. code-block:: console

      $ echopype_converter -s some_path/*.raw

   For converting ``.01A`` files from AZFP:

   .. code-block:: console

      $ echopype_converter -s azfp -x some_path/deployment.xml some_path/*.01A

   These will generate corresponding ``.nc`` files with the same leading
   filename as the original ``.raw`` files in the same directory.
   See :ref:`data-format` for details about the converted file format.

   .. note::  Currently the ``.nc`` files generated using the command line
      tool will have the fields
      ``platform_name``, ``platform_type``, and ``patform_code_ICES``
      in the `Platform` group all set to empty strings.


Data processing
---------------


Functionality
~~~~~~~~~~~~~

Echopype currently supports:

- Calibration and echo-integration to obtain volume backscattering strength (Sv)
  from the power data collected by EK60 and AZFP.

- Simple noise removal by suppressing data points below an adaptively estimated
  noise floor [1]_.

- Binning and averaging to obtain mean volume backscattering strength (MVBS)
  from the calibrated data.

The steps of performing these analysis for each echosounder are summarized below:

.. code-block:: python

   from echopype.model import EchoData
   nc_path = './converted_files/convertedfile.nc'  # path to a converted nc file
   ed = EchoData(nc_path)   # create an echo data processing object
   ed.calibrate()           # Sv
   ed.remove_noise()        # denoise
   ed.get_MVBS()            # calculate MVBS

By default, these methods do not save the calculation results to disk.
The computation results can be accessed from ``data.Sv``, ``data.Sv_clean`` and
``data.MVBS`` as xarray Datasets with proper dimension labels.

To save results to disk:

.. code-block:: python

   ed.calibrate(save=True)     # output: convertedfile_Sv.nc
   ed.remove_noise(save=True)  # output: convertedfile_Sv_clean.nc
   ed.get_MVBS(save=True)      # output: convertedfile_MVBS.nc


There are various options to save the results:

.. code-block:: python

   # Overwrite the output postfix from _Sv to_Cal: convertedfile_Cal.nc
   ed.calibrate(save=True, save_postfix='_Cal')

   # Save output to another directory: ./cal_results/convertedfile_Sv.nc
   ed.calibrate(save=True, save_path='./cal_results')

   # Save output to another directory with an arbitrary name
   ed.calibrate(save=True, save_path='./cal_results/somethingnew.nc')

By default, for noise removal and MVBS calculation, echopype tries to load Sv
already stored in memory (``ed.Sv``), or tries to calibrate the raw data to
obtain Sv. If ``ed.Sv`` is empty (i.e., whe calibration operation has not been
performed on the object), echopype will try to load Sv from ``*_Sv.nc`` from
the directory containing the converted ``.nc`` file or from the user-specified
path. For example:

1. Try to do MVBS calculation without having previously calibrated data

   .. code-block:: python

      from echopype.model import EchoData
      nc_path = './converted_files/convertedfile.nc'  # path to a converted nc file
      ed = EchoData(nc_path)   # create an echo data processing object
      ed.get_MVBS()  # echopype will call .calibrate() automatically

2. Try to do MVBS calculation with _Sv_clean.nc file previously created in
   folder 'another_directory'

   .. code-block:: python

      from echopype.model import EchoData
      nc_path = './converted_files/convertedfile.nc'  # path to a converted nc file
      ed = EchoData(nc_path)   # create an echo data processing object
      ed.get_MVBS(source_path='another_directory', source_postfix='_Sv_clean')


.. note:: Echopype's data processing functionality is being developed actively.
   Be sure to check back here often!


Environmental parameters
~~~~~~~~~~~~~~~~~~~~~~~~

Environmental parameters, including temperature, salinity and pressure, are
critical in biological interpretation of ocean sonar data. They influence

- Transducer calibration, through seawater absorption. This influence is
  frequency-dependent, and the higher the frequency the more sensitive the
  calibration is to the environmental parameters.

- Sound speed, which impacts the conversion from temporal resolution of
  (of each data sample) to spatial resolution, i.e. the sonar observation
  range would change.

By default, echopype uses the following for calibration:

- EK60: Environmental parameters saved with the data files

- AZFP: salinity = 29.6 PSU, pressure = 60 dbar,
  and temperature recorded at the instrument

These parameters should be overwritten when they differ from the actual
environmental condition during data collection.
To update these parameters, simply do the following *before*
calling ``ed.calibrate()``:

.. code-block:: python

   ed.temperature = 8   # temperature in degree Celsius
   ed.salinity = 30     # salinity in PSU
   ed.pressure = 50     # pressure in dbar
   ed.recalculate_environment()  # recalculate related parameters

This will trigger recalculation of all related parameters,
including sound speed, seawater absorption, thickness of each sonar
sample, and range. The updated values can be retrieved with:

.. code-block:: python

   ed.seawater_absorption  # absorption in [dB/m]
   ed.sound_speed          # sound speed in [m/s]
   ed.sample_thickness     # sample spatial resolution in [m]
   ed.range                # range for each sonar sample in [m]

For EK60 data, echopype updates the sound speed and seawater absorption
using the formulae from Mackenzie (1981) [2]_ and
Ainslie and McColm (1981) [3]_, respectively.

For AZFP data, echopype updates the sound speed and seawater absorption
using the formulae provided by the manufacturer ASL Environmental Sci.





Calibration parameters
~~~~~~~~~~~~~~~~~~~~~~

*Calibration* here refers to the calibration of transducers on an
echosounder, which finds the mapping between the voltage signal
recorded by the echosounder and the actual (physical) acoustic pressure
received at the transducer. This mapping is critical in deriving biological
quantities from acoustic measurements, such as estimating biomass.
More detail about the calibration procedure can be found in [4]_.

Echopype by default uses calibration parameters stored in the converted
files along with the backscatter measurements and other metadata parsed
from the raw data file.
However, since careful calibration is often done separately from the
data collection phase of the field work, accurate calibration parameters
are often supplied in the post-processing stage.
Currently echopypy allows users to overwrite calibration parameters for
EK60 data, including ``sa_correction``, ``equivalent_beam_angle``,
and ``gain_correction``.

As an example, to reset the equivalent beam angle for 18 kHz only,
one can do:

.. code-block:: python

   ed.equivalent_beam_angle.loc[dict(frequency=18000)] = -18.02  # set value for 18 kHz only

To set the equivalent beam angle for all channels at once, do:

.. code-block:: python

   ed.equivalent_beam_angle = [-17.47, -20.77, -21.13, -20.4 , -30]  # set all channels at once

Make sure you use ``ed.equivalent_beam_angle.frequency`` to check
the sequence of the frequency channels first, and always double
check after setting these parameters!




---------------

.. [1] De Robertis A, Higginbottoms I. (2007) A post-processing technique to
   estimate the signal-to-noise ratio and remove echosounder background noise.
   `ICES J. Mar. Sci. 64(6): 1282–1291. <https://academic.oup.com/icesjms/article/64/6/1282/616894>`_

.. [2] Mackenzie K. (1981) Nine‐term equation for sound speed in the oceans.
   `J. Acoust. Soc. Am. 70(3): 806-812 <https://asa.scitation.org/doi/10.1121/1.386920>`_

.. [3] Ainslie MA, McColm JG. (1998) A simplified formula for viscous and
   chemical absorption in sea water.
   `J. Acoust. Soc. Am. 103(3): 1671-1672 <https://asa.scitation.org/doi/10.1121/1.421258>`_

.. [4] Demer DA, Berger L, Bernasconi M, Bethke E, Boswell K, Chu D, Domokos R,
   et al. (2015) Calibration of acoustic instruments. `ICES Cooperative Research Report No.
   326. 133 pp. <https://doi.org/10.17895/ices.pub.5494>`_

.. TODO: Need to specify the changes we made from AZFP Matlab code to here:
   In the Matlab code, users set temperature/salinity parameters in
   AZFP_parameters.m and run that script first before doing unpacking.
   Here we require users to unpack raw data first into netCDF, and then
   set temperature/salinity in the model module if they want to perform
   calibration. This is cleaner and less error prone, because the param
   setting step is separated from the raw data unpacking, so user-defined
   params are not in the unpacked files.
