Metadata-Version: 2.1
Name: metapensiero.sqlalchemy.dbloady
Version: 3.0.dev0
Summary: YAML based data loader
Home-page: https://gitlab.com/metapensiero/metapensiero.sqlalchemy.dbloady.git
Author: Lele Gaifax
Author-email: lele@metapensiero.it
License: GPLv3+
Description: .. -*- coding: utf-8 -*-
        .. :Project:   metapensiero.sqlalchemy.dbloady -- YAML based data loader
        .. :Created:   ven  1 gen 2016, 16.19.54, CET
        .. :Author:    Lele Gaifax <lele@metapensiero.it>
        .. :License:   GNU General Public License version 3 or later
        .. :Copyright: © 2016, 2017, 2019 Lele Gaifax
        ..
        
        =================================
         metapensiero.sqlalchemy.dbloady
        =================================
        
        ----------------------
        YAML based data loader
        ----------------------
        
         :author: Lele Gaifax
         :contact: lele@metapensiero.it
         :license: GNU General Public License version 3 or later
        
        .. contents::
        
        Data loader
        ===========
        
        Load new instances in the database, or update/delete existing ones, given a data structure
        represented by a YAML stream, as the following::
        
            - entity: gam.model.Fascicolo
              key: descrizione
              # no data, just "declare" the entity
        
            - entity: gam.model.TipologiaFornitore
              key: tipologiafornitore
              rows:
                - &tf_onesto
                  tipologiafornitore: Test fornitori onesti
        
            - entity: gam.model.ClienteFornitore
              key: descrizione
              rows:
                - descrizione: Test altro fornitore onesto
                  tipologiafornitore: *tf_onesto
                  partitaiva: 01234567890
                - &cf_lele
                  codicefiscale: GFSMNL68C18H612V
                  descrizione: Dipendente A
        
            - entity: gam.model.Dipendente
              key: codicefiscale
              rows:
                - &lele
                  codicefiscale: GFSMNL68C18H612V
                  nome: Emanuele
                  cognome: Gaifas
                  clientefornitore: *cf_lele
                  foto: !File {path: ../img/lele.jpg}
        
            - entity: gam.model.Attrezzature
              key: descrizione
              rows:
                - &macchina
                  descrizione: Fiat 500
                  foto: !File
                    compressor: lzma
                    content: !!binary |
                      /Td6WFoAAATm1rRGAgAhA...
        
            - entity: gam.model.Prestiti
              key:
                - dipendente
                - attrezzatura
              rows:
                - dipendente: *lele
                  attrezzatura: *macchina
        
        As you can see, the YAML document is a sequence of entries, each one defining the content of a
        set of *instances* of a particular *entity*.
        
        The ``entity`` must be the fully qualified dotted name of the SQLAlchemy mapped class.
        
        The ``key`` entry may be either a single attribute name or a list of them, not necessarily
        corresponding to the primary key of the entity, provided that it uniquely identifies a single
        instance.  To handle the simplest case of structured values (for example, when a field is
        backed by a PostgreSQL HSTORE), the key attribute name may be in the form ``name->slot``::
        
            - entity: model.Product
              key: description->en
              rows:
                - &cage
                  description:
                    en: "Roadrunner cage"
                    it: "Gabbia per struzzi"
        
        The ``rows`` (or ``data``) may be either a single item or a list of them, each containing
        the data of a single instance, usually a dictionary.
        
        .. _fields:
        
        When all (or most of) the instances share the same fields, a more compact representation may be
        used::
        
            - entity: model.Values
              key:
                - product
                - attribute
              fields: [ product, attribute, value ]
              rows:
                - [ *cage, *size, 110cm x 110cm x 120cm ]
                - [ *cage, *weight, 230kg ]
        
        where ``fields`` contains a list of field names and ``rows`` is a sequence of lists, each
        containing the values of a single instance.  The two sintaxes may be mixed though, so you can
        say::
        
            - entity: model.Person
              key: [ lastname, firstname ]
              fields: [ lastname, firstname, password ]
              rows:
                - [ gaifax, lele, "123456" ]
                - [ foobar, john, "abcdef" ]
                - lastname: rossi
                  firstname: paolo
                  birthdate: 1950-02-03
        
        If you have a `tab-separated-values`__ file, you may say::
        
            - entity: model.Cities
              key:
                - name
                - country
              fields: [ name, country ]
              rows: !TSV {path: ../data/cities.txt, encoding: utf-8}
        
        and if the field names are included in the the first row of the file, simply omit the
        ``fields`` slot::
        
            - entity: model.Countries
              key:
                - code
              rows: !TSV {path: ../data/countries.txt, encoding: utf-8}
        
        The ``dbloady`` tool iterates over all the entities, and for each instance it determines if it
        already exists querying the database with the given *key*: if it's there, it updates it
        otherwise it creates a new one and initializes it with its data.
        
        __ https://en.wikipedia.org/wiki/Tab-separated_values
        
        
        Test fixture facility
        ---------------------
        
        With the option ``--save-new-instances`` newly created instances will be written (actually
        added) to the given file in YAML format, so that at some point they can be deleted using the
        option ``--delete`` on that file.  Ideally
        
        ::
        
          dbloady -u postgresql://localhost/test -s new.yaml fixture.yaml
          dbloady -u postgresql://localhost/test -D new.yaml
        
        should remove fixture's traces from the database, if it contains only new data.
        
        
        Pre and post load scripts
        -------------------------
        
        The option ``--preload`` may be used to execute an arbitrary Python script before any load
        happens.  This is useful either to tweak the YAML context or to alter the set of file names
        specified on the command line (received as the `fnames` global variable).
        
        The following script registers a custom costructor that recognizes the tag ``!time`` or a value
        like ``T12:34`` as a ``datetime.time`` value::
        
          import datetime, re
          from ruamel import yaml
        
          def time_constructor(loader, node):
              value = loader.construct_scalar(node)
              if value.startswith('T'):
                  value = value[1:]
              parts = map(int, value.split(':'))
              return datetime.time(*parts)
        
          yaml.add_constructor('!time', time_constructor)
          yaml.add_implicit_resolver('!time', re.compile(r'^T?\d{2}:\d{2}(:\d{2})?$'), ['T'])
        
        As another example, the following script handles input files with a ``.gpg`` suffix decrypting
        them on the fly to a temporary file that will be deleted when the program exits::
        
          import atexit, os, subprocess, tempfile
        
          def decipher(fname):
              print("Input file %s is encrypted, please enter passphrase" % fname)
              with tempfile.NamedTemporaryFile(suffix='.yaml') as f:
                  tmpfname = f.name
              subprocess.run(['gpg', '-q', '-o', tmpfname, fname], check=True)
              atexit.register(lambda n=tmpfname: os.unlink(n))
              return tmpfname
        
          fnames = [decipher(fname) if fname.endswith('.gpg') else fname for fname in fnames]
        
        Then you have::
        
          dbloady -u postgresql://localhost/test -p preload.py data.yaml.gpg
          Input file data.yaml.gpg is encrypted, please enter passphrase
          /tmp/tmpfhjrdqgf.yaml: ......
          Committing changes
        
        The option ``--postload`` may be used to perform additional steps *after* all YAML files have
        been loaded but *before* the DB transaction is committed.
        
        The pre/post load scripts are executed with a context containing the following variables:
        
        `session`
          the SQLAlchemy session
        
        `dry_run`
          the value of the ``--dry-run`` option
        
        `fnames`
          the list of file names specified on the command line
        
        
        Generic foreign keys
        --------------------
        
        Version 1.6 introduced rudimentary and experimental support for the `generic foreign keys`__
        trick. It assumes that they are implemented with a `hybrid property`__ that exposes a `custom
        comparator`__. See ``tests/generic_fk/model.py`` for an example.
        
        __ http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/generic_fk.html
        __ http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/hybrid.html
        __ http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/hybrid.html#building-custom-comparators
        
        With a proper configuration, the following works::
        
          - entity: model.Customer
            key: name
            data:
              - &customer
                name: Best customer
        
          - entity: model.Supplier
            key: company_name
            data:
              - &supplier
                company_name: ACME
        
          - entity: model.Address
            key:
              - related_object
              - street
            data:
              - related_object: *customer
                street: 123 anywhere street
              - related_object: *supplier
                street: 321 long winding road
        
        
        Direct assignment of primary keys
        ---------------------------------
        
        When the attribute does not correspond to a relationship property, assignment of an instance
        reference will set the attribute to the instance's primary key::
        
          - entity: model.Person
            key:
              - lastname
              - firstname
            fields:
              - lastname
              - firstname
            data:
              - &johndoe [ Doe, John ]
        
          - entity: model.CannedFilter
            key: description
            data:
              - &onlyjohndoe
                description: "Only John Doe"
        
          - entity: model.Condition
            key:
              - filter
              - fieldname
            data:
              - filter: *onlyjohndoe
                fieldname: "persons.id"
                fieldvalue: *johndoe
        
        Raw SQL values
        --------------
        
        Sometime a value requires executing an arbitrary query on the database, maybe because it is
        computed by a trigger or more generally because it cannot be determined by the YAML content::
        
          - entity: model.Number
            key:
              id
            data:
              - id: 1
                absolute: !SQL {query: "SELECT abs(:num)", params: {num: -1}}
              - id: !SQL {query: "SELECT abs(:num)", params: {num: -2}}
                absolute: !SQL {query: "SELECT abs(:num)", params: {num: -2}}
              - id: 3
                absolute: !SQL {query: "SELECT count(*) FROM numbers"}
        
        The specified query must return a single value, as it is executed with `session.scalar()`__.
        
        __ http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.scalar
        
        
        Data dumper
        ===========
        
        With the complementary tool, ``dbdumpy``, you can obtain a YAML representation out
        of a database in the same format used by ``dbloady``. It's rather simple and in particular it
        does not handle reference cycles.
        
        The tool is driven by a `specs file`, a YAML document composed by two parts: the first defines
        the `pivots` instances (that is, the entry points), the second describes how each entity must
        be serialized and in which order.
        
        Consider the following document::
        
          - entity: model.Result
          ---
          - entity: model.Person
            key:
              - lastname
              - firstname
        
          - entity: model.Exam
            fields: description
            key: description
        
          - entity: model.Result
            key:
              - person
              - exam
            other:
              - vote
        
        It tells ``dbdumpy`` to consider *all* instances of ``model.Result`` as the pivots, then
        defines how each entity must be serialized, simply by listing the ``key`` attribute(s) and any
        further ``other`` field. Alternatively, you can specify a list of ``fields`` names, to obtain
        the more compact form described `above`__.
        
        __ fields_
        
        Executing
        
        ::
        
          dbdumpy -u sqlite:////foo/bar.sqlite spec.yaml
        
        will emit the following on stdout::
        
          - entity: model.Person
            key:
            - lastname
            - firstname
            rows:
            - &id002
              firstname: John
              lastname: Doe
            - &id003
              firstname: Bar
              lastname: Foo
          - entity: model.Exam
            fields: description
            key: description
            rows:
            - &id001
              - Drive license
          - entity: model.Result
            key:
            - person
            - exam
            rows:
            - exam: *id001
              person: *id002
              vote: 10
            - exam: *id001
              person: *id003
              vote: 5
        
        
        .. -*- coding: utf-8 -*-
        
        Changes
        -------
        
        3.0.dev0 (2021-10-17)
        ~~~~~~~~~~~~~~~~~~~~~
        
        - Target Python 3.9+
        
        
        2.10 (2020-01-18)
        ~~~~~~~~~~~~~~~~~
        
        - Fix an issue with loading many-to-many relationships
        
        
        2.9 (2019-06-24)
        ~~~~~~~~~~~~~~~~
        
        - Mimic how PostgreSQL decodes ``\N`` as ``None`` in the TSV tag
        
        
        2.8 (2019-06-24)
        ~~~~~~~~~~~~~~~~
        
        - Ability to load data from an external tab-separated-values file
        
        
        2.7 (2019-05-10)
        ~~~~~~~~~~~~~~~~
        
        - Emit a critical log on attribute assignment failure, to aid debugging bad input
        
        
        2.6 (2018-04-17)
        ~~~~~~~~~~~~~~~~
        
        - Remove the fixup to progressbar2 `issue #162`__, solved in its 3.7.1 release
        
        __  https://github.com/WoLpH/python-progressbar/issues/162
        
        
        2.5 (2018-04-09)
        ~~~~~~~~~~~~~~~~
        
        - Try to fix different behaviour in progressbar2 3.7.0 w.r.t. multiple progress bars
        
        
        2.4 (2018-04-08)
        ~~~~~~~~~~~~~~~~
        
        - Now File elements can read text files
        
        - Support dumping hstore values (not tested enough, though)
        
        
        2.3 (2017-06-07)
        ~~~~~~~~~~~~~~~~
        
        - Fix handling of property based attributes
        
        
        2.2 (2017-05-18)
        ~~~~~~~~~~~~~~~~
        
        - The File elements may now contain their content, without accessing external files
        
        
        2.1 (2017-05-02)
        ~~~~~~~~~~~~~~~~
        
        - New ``--quiet`` option to omit the progress bar
        
        
        2.0 (2017-04-06)
        ~~~~~~~~~~~~~~~~
        
        - Require `ruamel.yaml`__ instead of PyYAML__
        
        __ https://pypi.python.org/pypi/ruamel.yaml
        __ https://pypi.python.org/pypi/PyYAML
        
        
        1.11 (2017-03-22)
        ~~~~~~~~~~~~~~~~~
        
        - Spring cleanup, no externally visible changes
        
        
        1.10 (2016-11-16)
        ~~~~~~~~~~~~~~~~~
        
        - Reduce load noise by using progressbar2__
        
        __ https://pypi.python.org/pypi/progressbar2
        
        
        1.9 (2016-11-15)
        ~~~~~~~~~~~~~~~~
        
        - Ability to execute raw SQL statements to fetch a value from the database
        
        
        1.8 (2016-11-15)
        ~~~~~~~~~~~~~~~~
        
        - Better tests
        
        - Handle assignments to non-relationship properties
        
        
        1.7 (2016-11-05)
        ~~~~~~~~~~~~~~~~
        
        - Make Python 3 happy by explicitly use binary mode to read external files
        
        
        1.6 (2016-10-29)
        ~~~~~~~~~~~~~~~~
        
        - Quick&approximated solution to load `generic associations`__
        
        __ http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/generic_fk.html
        
        
        1.5 (2016-03-12)
        ~~~~~~~~~~~~~~~~
        
        - New complementary dump functionality, exposed by a new cli tool, dbdumpy
        
        - Cosmetic, backward compatible, changes to the YAML format, for nicer sorting
        
        
        1.4 (2016-02-10)
        ~~~~~~~~~~~~~~~~
        
        - Data files and preload/postload scripts may be specified also as package relative resources
        
        
        1.3 (2016-01-14)
        ~~~~~~~~~~~~~~~~
        
        - New --preload and --postload options to execute arbitrary Python scripts before or after the
          load
        
        
        1.2 (2016-01-09)
        ~~~~~~~~~~~~~~~~
        
        - Fix source distribution
        
        
        1.1 (2016-01-09)
        ~~~~~~~~~~~~~~~~
        
        - Fix data refs when loading from compact representation
        
        
        1.0 (2016-01-07)
        ~~~~~~~~~~~~~~~~
        
        - Allow more compact representation when all instances share the same fields
        
        - Extract dbloady from metapensiero.sphinx.patchdb 1.4.2 into a standalone package
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Utilities
Description-Content-Type: text/x-rst
Provides-Extra: dev
