Metadata-Version: 2.1
Name: ckanext-transmute
Version: 1.2.5
Summary: Converts a dataset based on a specific schema
Home-page: https://github.com/mutantsan/ckanext-transmute
Author: Alexandr Cherniavskyi
Author-email: mutantsan@gmail.com
License: AGPL
Keywords: CKAN,scheming,schema
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 2.7
Description-Content-Type: text/markdown
License-File: LICENSE

# ckanext-transmute
The extension helps to validate and converts a dataset based on a specific schema.

## Working with transmute

`ckanext-transmute` provides an action `tsm_transmute` It helps us to transmute data with the provided convertion scheme. The action doesn't change the original data, but creates a new data dict. There are two mandatory arguments - `data` and `schema`. `data` is a data dict you have and `schema` helps you to validate/change data in it.

Example:
We have a data dict:
```
{
            "title": "Test-dataset",
            "email": "test@test.ua",
            "metadata_created": "",
            "metadata_modified": "",
            "metadata_reviewed": "",
            "resources": [
                {
                    "title": "test-res",
                    "extension": "xml",
                    "web": "https://stackoverflow.com/",
                    "sub-resources": [
                        {
                            "title": "sub-res",
                            "extension": "csv",
                            "extra": "should-be-removed",
                        }
                    ],
                },
                {
                    "title": "test-res2",
                    "extension": "csv",
                    "web": "https://stackoverflow.com/",
                },
            ],
        }
```
And we want to achieve this:
```
{
            "name": "test-dataset",
            "email": "test@test.ua",
            "metadata_created": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "metadata_modified": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "metadata_reviewed": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "attachments": [
                {
                    "name": "test-res",
                    "format": "XML",
                    "url": "https://stackoverflow.com/",
                    "sub-resources": [{"name": "SUB-RES", "format": "CSV"}],
                },
                {
                    "name": "test-res2",
                    "format": "CSV",
                    "url": "https://stackoverflow.com/",
                },
            ],
        }
```
Then, our schema must be something like that:
```
{
        "root": "Dataset",
        "types": {
            "Dataset": {
                "fields": {
                    "title": {
                        "validators": [
                            "tsm_string_only",
                            "tsm_to_lowercase",
                            "tsm_name_validator",
                        ],
                        "map": "name",
                    },
                    "resources": {
                        "type": "Resource",
                        "multiple": True,
                        "map": "attachments",
                    },
                    "metadata_created": {
                        "validators": ["tsm_isodate"],
                        "default": "2022-02-03T15:54:26.359453",
                    },
                    "metadata_modified": {
                        "validators": ["tsm_isodate"],
                        "default_from": "metadata_created",
                    },
                    "metadata_reviewed": {
                        "validators": ["tsm_isodate"],
                        "replace_from": "metadata_modified",
                    },
                }
            },
            "Resource": {
                "fields": {
                    "title": {
                        "validators": ["tsm_string_only"],
                        "map": "name",
                    },
                    "extension": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "format",
                    },
                    "web": {
                        "validators": ["tsm_string_only"],
                        "map": "url",
                    },
                    "sub-resources": {
                        "type": "Sub-Resource",
                        "multiple": True,
                    },
                },
            },
            "Sub-Resource": {
                "fields": {
                    "title": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "name",
                    },
                    "extension": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "format",
                    },
                    "extra": {
                        "remove": True,
                    },
                }
            },
        },
    }
```

There is an example of schema with nested types. The `root` field is mandatory, it's must contain a main type name, from which the scheme starts. As you can see, `Dataset` type contains `Resource` type which contans `Sub-Resource`. 

### Transmutators

There are few default transmutators you can use in your schema. Of course, you can define a custom transmutator with the `ITransmute` interface.
- `tsm_name_validator` - Wrapper over CKAN default `name_validator` validator
- `tsm_to_lowercase` - Casts string value to a lowercase
- `tsm_to_uppercase` - Casts string value to a uppercase
- `tsm_string_only` - Validates if `field.value` is string
- `tsm_isodate` - Wrapper over CKAN default `isodate` validator. Mutates an iso-like string to datetime object
- `tsm_to_string` - Casts a `field.value` to `str`
- `tsm_get_nested` - Allows you to pick up a value from a nested structure. Example:
```         
data = "title_translated": [
    {"nested_field": {"en": "en title", "ar": "العنوان ar"}},
]

schema = ...
    "title": {
        "replace_from": "title_translated",
        "validators": [
            ["tsm_get_nested", 0, "nested_field", "en"],
            "tsm_to_uppercase",
        ],
    },
    ...
```
This will take a value for a `title` field from `title_translated` field. Because `title_translated` is an array with nested objects, we are using the `tsm_get_nested` transmutator to achieve the value from it.

The default transmutator must receive at least one mandatory argument - `field` object. Field contains few properties: `field_name`, `value` and `type`.

There is a possibility to provide more arguments to a validator like in `tsm_get_nested`. For this use a nested array with first item transmutator and other - arguments to it.

### Keywords
1. `map_to` (`str`) - changes the `field.name` in result dict. 
2. `validators` (`list[str]`) - a list of transmutators that will be applied to a `field.value`. A transmutator could be a `string` or a `list` where the first item must be transmutator name and others are arbitrary values. Example:
    ```
    ...
    "validators": [
        ["tsm_get_nested", "nested_field", "en"],
        "tsm_to_uppercase",
    ,
    ...
    ```
    There are two transmutators: `tsm_get_nested` and `tsm_to_uppercase`.
3. `multiple` (`bool`, default: `False`) - if the field could have multiple items, e.g `resources` field in dataset, mark it as `multiple` to transmute all the items successively.
    ```
    ...
    "resources": {
        "type": "Resource",
        "multiple": True
    },
    ...
    ```
4. `remove` (`bool`, default: `False`) - removes a field from a result dict if `True`.
5. `default` (`Any`) - the default value that will be used if the original field.value evaluates to `False`.
6. `default_from` (`str`) - acts similar to `default` but accepts a `field.name` of a sibling field from which we want to take its value. Sibling field is a field that located in the same `type`. The current implementation doesn't allow to point on fields from other `types`. 
    ```
    ...
    "metadata_modified": {
        "validators": ["tsm_isodate"],
        "default_from": "metadata_created",
    },
    ...
    ```
7. `replace_from` (`str`) - acts similar to `default_from` but replaces the origin value whenever it's empty or not.
8. `value` (`Any`) - a value that will be used for a field. This keyword has the highest priority. Could be used to create a new field with an arbitrary value.
9. `update` (`bool`, default: `False) - if the original value is mutable (`array, object`) - you can update it. You can only update field values of the same types.

## Installation

To install ckanext-transmute:

1. Activate your CKAN virtual environment, for example:

     . /usr/lib/ckan/default/bin/activate

2. Clone the source and install it on the virtualenv

    git clone https://github.com/mutantsan/ckanext-transmute.git
    cd ckanext-transmute
    pip install -e .
	pip install -r requirements.txt

3. Add `transmute` to the `ckan.plugins` setting in your CKAN
   config file (by default the config file is located at
   `/etc/ckan/default/ckan.ini`).

4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:

     sudo service apache2 reload


## Developer installation

To install ckanext-transmute for development, activate your CKAN virtualenv and
do:

    git clone https://github.com/mutantsan/ckanext-transmute.git
    cd ckanext-transmute
    python setup.py develop
    pip install -r dev-requirements.txt


## Tests

I've used TDD to write this extension, so if you changing something be sure that all the tests are valid. To run the tests, do:

    pytest --ckan-ini=test.ini

## License

[AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html)


