Metadata-Version: 2.4
Name: adtl
Version: 0.12.0
Summary: Another data transformation language
Author: Abhishek Dasgupta, Pip Liggins
License: MIT License
        
        Copyright (c) 2022 Global.health
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: homepage, https://adtl.readthedocs.io
Project-URL: github, https://github.com/globaldothealth/adtl
Project-URL: releasenotes, https://github.com/globaldothealth/adtl/releases
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tomli>=2.0.0
Requires-Dist: pint>=0.24.4
Requires-Dist: requests>=2.0.0
Requires-Dist: fastjsonschema==2.16.*
Requires-Dist: tqdm
Requires-Dist: python-dateutil
Requires-Dist: more_itertools
Requires-Dist: pandas[parquet]==2.*
Requires-Dist: joblib
Provides-Extra: autoparser
Requires-Dist: numpy==2.*; extra == "autoparser"
Requires-Dist: openai>=1.52.2; extra == "autoparser"
Requires-Dist: openpyxl>=3.1.5; extra == "autoparser"
Requires-Dist: pydantic>=2.9.2; extra == "autoparser"
Requires-Dist: eval_type_backport; python_version < "3.10" and extra == "autoparser"
Requires-Dist: google-generativeai>=0.8.3; extra == "autoparser"
Requires-Dist: pandera[pandas]; extra == "autoparser"
Requires-Dist: fastparquet>=2024.11.0; extra == "autoparser"
Requires-Dist: tiktoken>=0.9.0; extra == "autoparser"
Provides-Extra: test
Requires-Dist: pytest>=8.3.3; extra == "test"
Requires-Dist: pytest-cov>=6.0.0; extra == "test"
Requires-Dist: syrupy==4.*; extra == "test"
Requires-Dist: responses; extra == "test"
Requires-Dist: pytest-unordered; extra == "test"
Requires-Dist: adtl[autoparser]; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx==8.*; python_version >= "3.10" and extra == "docs"
Requires-Dist: sphinx-book-theme; extra == "docs"
Requires-Dist: sphinxcontrib-mermaid; extra == "docs"
Requires-Dist: myst-nb==1.*; extra == "docs"
Requires-Dist: adtl[autoparser]; extra == "docs"
Provides-Extra: all
Requires-Dist: adtl[docs,test]; extra == "all"
Dynamic: license-file

# adtl – another data transformation language

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

[![tests](https://github.com/globaldothealth/adtl/actions/workflows/tests.yml/badge.svg)](https://github.com/globaldothealth/adtl/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/globaldothealth/adtl/branch/main/graph/badge.svg?token=QTD7HRR3TO)](https://codecov.io/gh/globaldothealth/adtl)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)


adtl is a data transformation language (DTL) used by some applications in
[Global.health](https://global.health), notably for the ISARIC clinical data pipeline at
[globaldothealth/isaric](https://github.com/globaldothealth/isaric) and the InsightBoard
project dashboard at [globaldothealth/InsightBoard](https://github.com/globaldothealth/InsightBoard)

Documentation: [ReadTheDocs](https://adtl.readthedocs.io/en/latest/index.html)

## Installation

You can install this package using either [`pipx`](https://pypa.github.io/pipx/)
or `pip`. Installing via `pipx` offers advantages if you want to just use the
`adtl` tool standalone from the command line, as it isolates the Python
package dependencies in a virtual environment. On the other hand, `pip` installs
packages to the global environment which is generally not recommended as it
can interfere with other packages on your system.

* Installation via `pipx`:

  ```shell
  pipx install adtl
  ```

* Installation via `pip`:

  ```shell
  python3 -m pip install adtl
  ```

If you are writing code which depends on adtl (instead of using the
command-line program), then it is best to add a dependency on `adtl` to your
Python build tool of choice.

To use the development version, replace `adtl` with the full GitHub URL:

```shell
pip install git+https://github.com/globaldothealth/adtl
```

## Rationale

Most existing data transformation languages are usually in a XML dialect, though
there are recent variations in other file formats. In addition, many DTLs use a
custom domain specific language. The primary utility of this DTL is to provide a
easy to use library in Python for basic data transformations, which are
specified in a JSON file. It is not meant to be a comprehensive, and adtl can
be used as a step within a larger data processing pipeline.

## Usage

adtl can be used from the command line or as a Python library

As a CLI:
```bash
adtl parse specification-file input-file
```

Here *specification-file* is the parser specification (as TOML or JSON)
and *input-file* is the data file (not the data dictionary) that adtl
will transform using the instructions in the specification.

If adtl is not in your PATH, this may give an error. Either add the location
where the adtl script is installed to your PATH, or try running adtl as a module

```shell
python3 -m adtl parse specification-file input-file
```

Running adtl will create output files with the name of the parser, suffixed with
table names in the current working directory.

Before trying to transform your data, you can check that your specification file matches
the format adtl expects, and for fields which may have been either misspelled or missed out
during the mapping, by using:
```bash
adtl check specification-file input-file
```

Python library:
```python
import adtl

parser = adtl.Parser(specification)
print(parser.tables) # list of tables created

for row in parser.parse().read_table(table):
    print(row)
```
alternatively to get an output file as a CSV, similarly to the CLI:
```python
import adtl

data = adtl.parse("specification-file", "input-file")
```
where `data` is returned as a dictionary of pandas dataframes, one for each table.

## Development

Install [pre-commit](https://pre-commit.com) and setup pre-commit hooks
(`pre-commit install`) which will do linting checks before commit.
