# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['bigquery_frame', 'bigquery_frame.transformations_impl']

package_data = \
{'': ['*']}

install_requires = \
['google-cloud-bigquery>=2.31.0', 'tabulate>=0.8.9', 'tqdm>=4.0.0']

setup_kwargs = {
    'name': 'bigquery-frame',
    'version': '0.2.11',
    'description': 'A DataFrame API for Google BigQuery',
    'long_description': '# Bigquery-frame\n\n## What is it ?\n\nThis project is a POC that aims to showcase the wonders\nthat could be done if BigQuery provided a DataFrame API in \nPython similar to the one already available with PySpark\nor Snowpark (for which the Python API will come out soon).\n\nI tried to reproduce the most commonly used methods of the Spark DataFrame object. \nI aimed at making something as close as possible as PySpark, and tried to keep exactly\nthe same naming and docstrings as PySpark\'s DataFrames.\n \n\nFor instance, this is a working example of PySpark code :\n```python\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql import functions as f\n\nspark = SparkSession.builder.master("local[1]").getOrCreate()\n\ndf = spark.sql("""\n    SELECT 1 as id, "Bulbasaur" as name, ARRAY("Grass", "Poison") as types, NULL as other_col\n    UNION ALL\n    SELECT 2 as id, "Ivysaur" as name, ARRAY("Grass", "Poison") as types, NULL as other_col\n""")\n\ndf.select("id", "name", "types").createOrReplaceTempView("pokedex")\n\ndf2 = spark.sql("""SELECT * FROM pokedex""")\\\n    .withColumn("nb_types", f.expr("SIZE(types)"))\\\n    .withColumn("name", f.expr("LOWER(name)"))\n\ndf2.show()\n# +---+---------+---------------+--------+\n# | id|     name|          types|nb_types|\n# +---+---------+---------------+--------+\n# |  1|bulbasaur|[Grass, Poison]|       2|\n# |  2|  ivysaur|[Grass, Poison]|       2|\n# +---+---------+---------------+--------+\n```\n\nAnd this is an equivalent working example using bigquery_frame, that runs on Google Big Query! \n```python\nfrom bigquery_frame import BigQueryBuilder\nfrom bigquery_frame.auth import get_bq_client\nfrom bigquery_frame import functions as f\n\nbigquery = BigQueryBuilder(get_bq_client())\n\ndf = bigquery.sql("""\n    SELECT 1 as id, "Bulbasaur" as name, ["Grass", "Poison"] as types, NULL as other_col\n    UNION ALL\n    SELECT 2 as id, "Ivysaur" as name, ["Grass", "Poison"] as types, NULL as other_col\n""")\n\ndf.select("id", "name", "types").createOrReplaceTempView("pokedex")\n\ndf2 = bigquery.sql("""SELECT * FROM pokedex""")\\\n    .withColumn("nb_types", f.expr("ARRAY_LENGTH(types)"))\\\n    .withColumn("name", f.expr("LOWER(name)"), replace=True)\n\ndf2.show()\n# +----+-----------+---------------------+----------+\n# | id |      name |               types | nb_types |\n# +----+-----------+---------------------+----------+\n# |  1 | bulbasaur | [\'Grass\', \'Poison\'] |        2 |\n# |  2 |   ivysaur | [\'Grass\', \'Poison\'] |        2 |\n# +----+-----------+---------------------+----------+\n```\n\n## What\'s so cool about DataFrames ?\n\nI believe that DataFrames are super cool to organise SQL code as it allows us to \nseveral things that are much harder, or even impossible, in pure-SQL:\n\n- on-the-fly introspection\n- chaining operations\n- generic transformations\n- higher level abstraction\n\nBut that deserves a blog article (coming soon).\n\n## I want to try this POC, how do I use it ?\n\nJust clone this repository, open PyCharm, and follow the\ninstructions in the [AUTH.md](/AUTH.md) documentation\nto set up your connection to BigQuery. Then, go fiddle\nwith the [demo](/examples/demo.py), or have a look at the [examples](/examples).\n\n\n## How does it work ?\n\nVery simply, by generating SQL queries that are sent to BigQuery.\nYou can get the query by calling the method `DataFrame.compile()`.\n\nFor instance, if we reuse the example from the beginning:\n```\nprint(df2.compile())\n```\n\nThis will print the following SQL query:\n```SQL\nWITH pokedex AS (\n  WITH _default_alias_1 AS (\n    \n        SELECT 1 as id, "Bulbasaur" as name, ["Grass", "Poison"] as types, NULL as other_col\n        UNION ALL\n        SELECT 2 as id, "Ivysaur" as name, ["Grass", "Poison"] as types, NULL as other_col\n    \n  )\n  SELECT \n    id,\n    name,\n    types\n  FROM _default_alias_1\n)\n, _default_alias_3 AS (\n  SELECT * FROM pokedex\n)\n, _default_alias_4 AS (\n  SELECT \n    *,\n    ARRAY_LENGTH(types) AS nb_types\n  FROM _default_alias_3\n)\nSELECT \n  * REPLACE (\n    LOWER(name) AS name\n  )\nFROM _default_alias_4\n```\n\n## Facturation\n\nThe examples in this code only use generated data and don\'t ready any "real" table.\nThis means that you won\'t be charged a cent running them.\n\nAlso, even when reading "real" tables, any one-the-fly introspection (such as\ngetting a DataFrame\'s schema), will trigger a query on BigQuery but will read\n0 rows, and will thus be billed 0 cent.\n\n## Known limitations\n\nSince this is a POC, I took some shortcuts and did not try to optimize the query length.\nIn particular, this uses _**a lot**_ of CTEs, and any serious project trying to use it\nmight reach the maximum query length very quickly.\n\nHere is a list of other known limitations, please also see the \n[Further developments](#further-developments) section for a list of missing features.\n\n- `DataFrame.withColumn`: \n  - unlike in Spark, replacing an existing column is  \n    not done automatically, an extra argument `replace=True` must be passed.\n- `DataFrame.createOrReplaceTempView`: \n  - I kept the same name as Spark for consistency, but it does not create an actual view on BigQuery, it just emulates \n    Spark\'s behaviour by using a CTE. Because of this, if you replace a temp view that already exists, the new view\n    can not derive from the old view (while in Spark it is possible). \n\n## Further developments\n\nFunctions not supported yet :\n\n- `DataFrame.join`\n- `DataFrame.groupBy`\n- `DataFrame.printSchema`\n\nAlso, it would be cool to expand this to other SQL engines than BigQuery \n(contributors are welcome ;-) ).\n\n\n## Why did I make this ?\n\nI hope that it will motivate the teams working on BigQuery (or Redshift, \nor Azure Synapse) to propose a real python DataFrame API on top of their \nmassively parallel SQL engines. But not something ugly like this POC,\nthat generates SQL strings, more something like Spark Catalyst, which directly\ngenerates logical plans out of the DataFrame API without passing through the \n"SQL string" step.\n\nAfter starting this POC, I realized Snowflake already understood this and \ndeveloped Snowpark, a Java/Scala (and soon Python) API to run complex workflows\non Snowflake, and [Snowpark\'s DataFrame API](https://docs.snowflake.com/en/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/DataFrame.html)\nwhich was clearly borrowed from [Spark\'s DataFrame (= DataSet[Row]) API](https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html)\n(we recognize several key method names: cache, createOrReplaceTempView, \nwhere/filter, collect, toLocalIterator). \n\nI believe such project could open the gate to hundreds of very cool applications.\nFor instance, did you know that, in its early versions at least, Dataiku Shaker \nwas just a GUI that chained transformations on Pandas DataFrame, and later \nSpark DataFrame ? \n\nAnother possible evolution would be to make a DataFrame API capable of speaking\nmultiple SQL dialects. By using it, projects that generate SQL for multiple \nplatforms, like [Malloy](https://github.com/looker-open-source/malloy), could\nall use the same DataFrame abstraction. Adding support for a new SQL platform\nwould immediately allow all the project based on it to support this new platform.\n\n**I would be very interested if someone could make a similar POC with, \nRedShift, Postgres, Azure Synapse, or any other SQL engines \n(aside from Spark-SQL and Snowpark, of course :-p).**\n',
    'author': 'FurcyPin',
    'author_email': 'None',
    'maintainer': 'None',
    'maintainer_email': 'None',
    'url': 'https://github.com/FurcyPin/bigquery-frame',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.6.1,<3.11',
}


setup(**setup_kwargs)
