Metadata-Version: 2.1
Name: babbage
Version: 0.4.0
Summary: A light-weight analytical engine for OLAP processing
Home-page: http://github.com/openspending/babbage
Author: Friedrich Lindenberg
Author-email: friedrich@pudo.org
License: MIT
Description: # Babbage Analytical Engine
        
        [![Gitter](https://img.shields.io/gitter/room/openspending/chat.svg)](https://gitter.im/openspending/chat)
        [![Build Status](https://travis-ci.org/openspending/babbage.svg?branch=master)](https://travis-ci.org/openspending/babbage)
        [![Coverage Status](https://coveralls.io/repos/openspending/babbage/badge.svg?branch=master&service=github)](https://coveralls.io/github/openspending/babbage?branch=master)
        
        ``babbage`` is a lightweight implementation of an OLAP-style database
        query tool for PostgreSQL. Given a database schema and a logical model
        of the data, it can be used to perform analytical queries against that
        data - programmatically or via a web API.
        
        It is heavily inspired by [Cubes](http://cubes.databrewery.org/) but
        has less ambitious goals, i.e. no pre-computation of aggregates, or
        multiple storage backends.
        
        ``babbage`` is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).
        
        ## Installation and test
        
        ``babbage`` will normally included as a PyPI dependency, or installed via
        ``pip``:
        
        ```bash
        $ pip install babbage
        ```
        
        People interested in contributing to the package should instead check out the
        source repository and then use the provided ``Makefile`` to install the
        library (this requires ``virtualenv`` to be installed):
        
        ```bash
        $ git clone https://github.com/openspending/babbage.git
        $ cd babbage
        $ make install
        $ pip install tox
        $ export BABBAGE_TEST_DB=postgresql://postgres@localhost:5432/postgres
        $ make test
        ```
        
        ## Usage
        
        ``babbage`` is used to query a set of existing database tables, using an 
        abstract, logical model to query them. A sample of a logical model can be
        found in ``tests/fixtures/models/cra.json``, and a JSON schema specifying
        the model is available in ``babbage/schema/model.json``.
        
        The central unit of ``babbage`` is a ``Cube``, i.e. a [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) that uses the provided model metadata to construct queries 
        against a database table. Additionally, the application supports managing
        multiple cubes at the same time via a ``CubeManager``, which can be
        subclassed to enable application-specific ways of defining cubes and where
        their metadata is stored.
        
        Futher, ``babbage`` includes a Flask Blueprint that can be used to expose
        a standard API via HTTP. This API is consumed by the JavaScript ``babbage.ui``
        package and it is very closely modelled on the Cubes and OpenSpending HTTP
        APIs.
        
        ### Programmatic usage
        
        Let's assume you have an existing database table of procurement data and
        want to query it using ``babbage`` in a Python shell. A session might look
        like this:
        
        ```python
        import json
        from sqlalchemy import create_engine
        from babbage.cube import Cube
        from babbage.model import Measure
        
        engine = create_engine('postgresql://localhost/procurement')
        model = json.load(open('procurement_model.json', 'r'))
        
        cube = Cube(engine, 'procurement', model)
        facts = cube.facts(page_size=5)
        
        # There are 17201 rows in the table:
        assert facts['total_fact_count'] == 17201
        
        # There's a field called 'total_value':
        assert 'total_value' in facts['fields']
        
        # We can get metadata about it:
        concept = cube.model['total_value']
        assert isinstance(concept, Measure)
        assert concept.label == 'Total Value'
        
        # And there's some actual data:
        assert len(facts['data']) == 5
        fact_0 = facts['data'][0]
        assert 'total_value' in fact_0
        
        # For dimensions, we can get all the distinct values:
        members = cube.members('supplier', cut='year:2015', page_size=500)
        assert len(members['data']) <= 500
        assert members['total_member_count']
        
        # And, finally, we can aggregate by specific dimensions:
        aggregate = cube.aggregate(aggregates='total_value.sum',
                                   drilldowns='supplier|authority'
                                   cut='year:2015|authority.country:GB',
                                   page_size=500)
        # This translates to: 
        #   Aggregate the procurement data by summing up the 'total_value'
        #   for each unique pair of values in the 'supplier' and 'authority'
        #   dimensions, and filter for only those entries where the 'year'
        #   dimensions key attribute is '2015' and the 'authority' dimensions
        #   'country' attribute is 'GB'. Return the first 500 results.
        assert aggregate['total_cell_count']
        assert len(aggregate['cells']) <= 500
        aggregate_0 = aggregate['cells'][0]
        assert 'total_value.sum' in aggregate_0
        
        # Note that these attribute names are made up for this example, they
        # should be reflected from the model:
        assert 'supplier.code' in aggregate_0
        assert 'supplier.label' in aggregate_0
        assert 'authority.code' in aggregate_0
        assert 'authority.label' in aggregate_0
        ```
        
        ### Using the HTTP API
        
        The HTTP API for ``babbage`` is a simple Flask [Blueprint](http://flask.pocoo.org/docs/latest/blueprints/) used to expose a small set of calls that correspond to
        the cube functions listed above. To include it into an existing Flask
        application, you would need to create a ``CubeManager`` and then
        configure the API like this: 
        
        ```python
        from flask import Flask
        from sqlalchemy import create_engine
        from babbage.manager import JSONCubeManager
        from babbage.api import configure_api
        
        app = Flask('demo')
        engine = 
        models_directory = 'models/'
        manager = JSONCubeManager(engine, models_directory)
        blueprint = configure_api(app, manager)
        app.register_blueprint(blueprint, url_prefix='/api/babbage')
        
        app.run()
        ```
        
        Of course, you can define your own ``CubeManager``, for example if
        you wish to retrieve model metadata from a database.
        
        When enabled, the API will expose a number of JSON(P) endpoints
        relative to the given ``url_prefix``:
        
        * ``/``, returns the system status and version.
        * ``/cubes``, returns a list of the available cubes (name only).
        * ``/cubes/<name>/model``, returns full metadata for a given 
          cube (i.e. measures, dimensions, aggregates etc.)
        * ``/cubes/<name>/facts`` is used to return individual entries from
          the cube in a non-aggregated form. Supports filters (``cut``), a
          set of ``fields`` to return and a ``sort`` (``field_name:direction``),
          as well as ``page`` and ``page_size``.
        * ``/cubes/<name>/members`` is used to return the distinct set of 
          values for a given dimension, e.g. all the suppliers mentioned in
          a procurement dataset. Supports filters (``cut``), a and a ``sort``
          (``field_name:direction``), as well as ``page`` and ``page_size``.
        * ``/cubes/<name>/aggregate`` is the main endpoint for generating 
          aggregate views of the data. Supports specifying the ``aggregates``
          to include, the ``drilldowns`` to aggregate by, a set of filters
          (``cut``), a and a ``sort`` (``field_name:direction``), as well
          as ``page`` and ``page_size``.
         
        
Keywords: sql sqlalchemy olap cubes analytics
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Description-Content-Type: text/markdown
