# -*- coding: utf-8 -*-
from setuptools import setup

package_dir = \
{'': 'src'}

packages = \
['smolqwery', 'smolqwery.management.commands']

package_data = \
{'': ['*']}

install_requires = \
['black>=22.1.0,<23.0.0',
 'google-cloud-bigquery>=2.34.1,<3.0.0',
 'python-dateutil>=2.8.2,<3.0.0',
 'rich>=11.2.0,<12.0.0']

setup_kwargs = {
    'name': 'smolqwery',
    'version': '0.1.1',
    'description': 'A Django-oriented tool to make analytics exports to BigQuery',
    'long_description': '# Smolqwery\n\nA Django-oriented micro-framework to help structuring BigQuery exports with Data\nStudio and analytics in mind.\n\nIt lets you write minimal extractors that will generate the statistics you need\nand it will manage everything around them (creating the tables on BigQuery,\nrunning the exports, etc).\n\n## Integration guide\n\nWe\'ll review here how you can integrate Smolqwery in your project.\n\n### Installation\n\nFirst step is (obviously to add it to the dependencies). You just need to\nspecify that you need the `smolqwery` package (use `pip`, `poetry` or whatever).\n\n### Django configuration\n\nYou can thank Google for that, the configuration is a bit complex. The first\nthing to do is to add it in your installed apps:\n\n```python\nINSTALLED_APPS = [\n    # your stuff\n    "smolqwery",\n    # more of your stuff\n]\n```\n\nThen there is _a few_ variables to configure:\n\n-   Things specific to your project\n    -   `SMOLQWERY_EXTRACTORS` &mdash; List of extractors, more on this below\n    -   `SMOLQWERY_DJANGO_APP` &mdash; App in which migrations must be created\n    -   `SMOLQWERY_FIRST_DATE` &mdash; First date of data to include in exports\n        (all days between this date and now will be exported)\n-   Things that you want to specify about BigQuery\n    -   `SMOLQWERY_DATASET` &mdash; Name of the dataset (database if you compare\n        to PostgreSQL for example). It will be created automatically by the\n        migrations, you don\'t need to create it yourself.\n    -   `SMOLQWERY_DATASET_LOCATION` &mdash; Location of the data. Unless you\n        want to be specific you can just say "EU" or "US"\n    -   `SMOLQWERY_ACL_GROUPS` &mdash; List of IAM groups emails that will\n        receive the read/write permissions on the created dataset\n-   Google credentials (more on this later)\n    -   `SMOLQWERY_GOOGLE_TYPE`\n    -   `SMOLQWERY_GOOGLE_PROJECT_ID`\n    -   `SMOLQWERY_GOOGLE_PRIVATE_KEY_ID`\n    -   `SMOLQWERY_GOOGLE_PRIVATE_KEY`\n    -   `SMOLQWERY_GOOGLE_CLIENT_EMAIL`\n    -   `SMOLQWERY_GOOGLE_CLIENT_ID`\n    -   `SMOLQWERY_GOOGLE_AUTH_URI`\n    -   `SMOLQWERY_GOOGLE_TOKEN_URI`\n    -   `SMOLQWERY_GOOGLE_AUTH_PROVIDER_X509_CERT_URL`\n    -   `SMOLQWERY_GOOGLE_CLIENT_X509_CERT_URL`\n\n### Getting Google credentials\n\nSo. There is a bunch of credentials to get. Roughly, the steps to get them is:\n\n-   Log into the Google Cloud Console\n-   Create a project (you can re-use this same project between several instances\n    of your code as long as you let each instance use a different dataset name)\n-   Create a service account within this project\n-   Download the credentials of this service account (as JSON file) and recopy\n    the contents of this file into the configuration. For example in the JSON\n    file there is a `project_id` entry, it matches the\n    `SMOLQWERY_GOOGLE_PROJECT_ID` setting\n\n### Creating extractors\n\nYou can now start creating extractors. It\'s the way your code will convert your\ndata into the statistics you want to store.\n\n#### Extractor type\n\nThe first step is to decide what kind of extractor you\'re going to do:\n\n-   Date-aggregated &mdash; Each row is a single date. By example "today there\n    was 345 new users and 23 new contracts"\n-   Individual rows &mdash; Each row is one entry. By example, each row can\n    represent one email that has been sent.\n\nHow to choose? If you\'re dealing with personal data, the only way you can use it\nfor statistics without consent is to aggregate it.\n\n> _Note_ &mdash; As long as an ID represents a single or a few users (hash,\n> fingerprint, etc) then it\'s considered to be a personal data. There is no such\n> thing as anonymization. The only way out is aggregation.\n\nThe way to handle dates is going to be different depending on the type:\n\n-   Date-aggregated extractors don\'t have to include a date in their data model,\n    since it\'s the system that decides which date range corresponds to which\n    date "row". For that matter, you need need to reply with the data you deduce\n    from the range you\'re asked to extract and that\'s it.\n-   Individual rows must contain a timestamp field which indicates to which date\n    this field is related. It can be a timestamp (date/time + timezone) or a\n    simple date. It will not be added automatically but it is expected to be\n    named `timestamp` (see below how to define fields). You can override this\n    name in the extractor if you need to.\n\n#### Dataclass\n\nYou declare what you\'re going to return using a dataclass. It\'s a bit like\nDjango models, it\'s used to define what are the fields.\n\nHere is type types mapping (in relation to Big Query\'s types):\n\n-   `int` &rarr; `INT64`\n-   `float` &rarr; `FLOAT64`\n-   `bool` &rarr; `BOOL`\n-   `str` &rarr; `STRING`\n-   `bytes` &rarr; `BYTES`\n-   `datetime.date` &rarr; `DATE`\n-   `datetime.datetime` &rarr; `TIMESTAMP` (make sure to use time-zone-aware\n    datetime instances, there is no check of this)\n\nIf you need your field to be nullable you can use the `Optional` type\nannotation.\n\nAnother thing is the "differentiated" fields. For example let\'s say that you\nhave a strictly growing metric ("number of users that ever registered") and you\nwant to easily know the difference from one date to the other ("number of users\nthat registered during the time range") in the data studio. You can mark a field\nas "differentiated", which will trigger the creation of a view where the metric\nis differentiated day-by-day.\n\nYou would have the `user` table with this data:\n\n| Date       | Registered |\n| ---------- | ---------- |\n| 2022-01-01 | 10         |\n| 2022-01-02 | 20         |\n| 2022-01-03 | 35         |\n\nAnd then the `user_delta` view with:\n\n| Date       | Registered |\n| ---------- | ---------- |\n| 2022-01-01 | 10         |\n| 2022-01-02 | 10         |\n| 2022-01-03 | 15         |\n\nThis can be done using the `sq_field(diffrentiate=True)` field (which is a\nshortcut to `dataclasses`\'s `field` created for our purpose).\n\nHere is an example of both a date-aggregated and individual-rows dataclasses:\n\n```python\nfrom dataclasses import dataclass\nfrom smolqwery import sq_field\nimport datetime\n\n@dataclass\nclass User:\n    users: int = sq_field(differentiate=True)\n    prospects: int = sq_field(differentiate=True)\n    clients: int = sq_field(differentiate=True)\n\n\n@dataclass\nclass Email:\n    timestamp: datetime.datetime\n    type: str\n```\n\n#### Extractor\n\nThe extractors themselves are a simple interface that you need to implement to\nreach your needs. For example:\n\n```python\nclass EmailExtractor(BaseExtractor[Email]):\n    def get_dataclass(self) -> Type[Email]:\n        return Email\n\n    def get_extractor_type(self) -> ExtractorType:\n        return ExtractorType.individual_rows\n\n    def extract(\n        self, date_start: datetime.datetime, date_end: datetime.datetime\n    ) -> Iterator[Email]:\n        for email in EmailMessage.objects.filter(\n            date_sent__gte=date_start, date_sent__lt=date_end\n        ):\n            yield Email(\n                timestamp=email.date_sent,\n                type=email.type,\n            )\n```\n\nLet\'s note here that `date_start` is inclusive and `date_end` is exclusive.\n\nYou need to declare your extractors in the settings, for example:\n\n```python\nSMOLQWERY_EXTRACTORS = [\n    "core.smolqwery.UserExtractor",\n    "core.smolqwery.EmailExtractor",\n]\n```\n\n#### Migrations\n\nLike Django, and using Django\'s system in part, Smolqwery will manage its table\nwithin BigQuery using migrations. Those schema are automatically created from\nthe dataclasses that you\'ve defined.\n\nHowever unlike Django you cannot change a data model once it has been created,\nbecause it would complicate things a lot for something that you don\'t really\nneed.\n\nSee it that way: you want to add a bunch of statistics to your BigQuery `user`\ntable. If your statistics can be computed from observing the data in your\ndatabase, you could just create a `user2` table and re-compute statistics from\nthe beginning into this table. This will provide the warranty that all fields\npresent indeed have a value (as opposed to having to keep all new columns at\nNULL).\n\nIt\'s quite flexible this way: you can either create a new table and put just the\nnew fields there, either forget about the old table and re-compute everything\nfrom the start in a new table, etc.\n\nSo, how do you migrate?\n\n```\n./manage.py smolqwery_make_migrations\n./manage.py migrate\n```\n\nLet\'s note that if you (or the next person) doesn\'t have the Google credentials\nconfigured, then you\'ll run into an issue.\n\n### Exporting\n\nNow you can run the export.\n\nEither with a test run\n\n```\n./manage.py smolqwery_print_extract -f 2022-01-01 -l 2022-01-05\n```\n\nEither with a real run, that will really insert the data\n\n```\n./manage.py smolqwery_extract\n```\n\nThe extract will look up for the date of the last extract table by table. If the\nextract was never done then it will fall back to the `SMOLQWERY_FIRST_DATE`\nsetting.\n\nAll the days between the last extract and the last revolute day (in Django\'s\ntime zone) will be extracted.\n\nYou can safely run this as a cron, several times a day if you want to.\n\n### Exploiting the data\n\nNow all your data is in BigQuery and you can start using it from Data Studio or\nother sources!\n',
    'author': 'Rémy Sanchez',
    'author_email': 'remy.sanchez@hyperthese.net',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/Xowap/smolqwery',
    'package_dir': package_dir,
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.8,<3.11',
}


setup(**setup_kwargs)
