Metadata-Version: 2.1
Name: datapatch
Version: 0.2.0
Summary: UNKNOWN
Home-page: https://github.com/pudo/datapatch
Author: Friedrich Lindenberg
Author-email: friedrich@pudo.org
License: MIT
Description: # datapatch
        
        A Python library for defining rule-based overrides on messy data. Imagine, for example,
        trying to import a dataset in each row is associated with a country - which have been 
        entered by humans. You might find country names like `Northkorea`, or `Greet Britain`
        that you want to normalise. `datapatch` creates a mechanism to build a flexible lookup
        table (usually stored as a YAML file) to catch and repair these data issues.
        
        ## Installation
        
        You can install `datapatch` from the Python package index:
        
        ```bash
        pip install datapatch
        ```
        
        ## Example
        
        Given a YAML file like this:
        
        ```yaml
        countries:
          normalize: true
          lowercase: true
          options:
            - match: Frankreich
              value: France
            - match:
                - Northkorea
                - Nordkorea
                - Northern Korea
                - NKorea
                - DPRK
              value: North Korea
            - contains: Britain
              value: Great Britain
        ```
        
        The file can be used to apply the data patches against raw input:
        
        ```python
        from datapatch import read_lookups, LookupException
        
        lookups = read_lookups("countries.yml")
        countries = lookups.get("countries")
        
        # This will apply the patch or default to the original string if none exists:
        for row in iter_data():
            raw = row.get("Country")
            row["Country"] = countries.get_value(raw, default=raw)
        ```
        
        There's a host of options available to configure the application of the data
        patches:
        
        ```yaml
        countries:
          # If you mark a lookup as required, a value that matches no options will
          # throw a `datapatch.exc:LookupException`.
          required: true
          # Normalisation will remove many special characters, remove multiple spaces
          # and perform some basic matching across alphabets (Путин -> Putin).
          normalize: false
          options:
            - match: Francois
              value: France
          # This is a shorthand for defining options that have just one `match` and
          # one `value` defined:
          map:
            Luxemborg: Luxembourg
            Lux: Luxembourg
        ```
        
        You can also have more details associated with a result and access them:
        
        ```yaml
        countries:
          options:
            - match: Frankreich
              # These can be arbitrary attributes:
              label: France
              code: FR
        ```
        
        This can be accessed as a result object with attributes:
        
        ```python
        from datapatch import read_lookups, LookupException
        
        lookups = read_lookups("countries.yml")
        countries = lookups.get("countries")
        
        result = countries.match("Frankreich")
        print(result.label, result.code)
        assert result.capital is None, result.capital
        ```
        
        ### License
        
        `datapatch` is licensed under the terms of the MIT license, which is included as
        `LICENSE`.
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
Provides-Extra: dev
