# -*- coding: utf-8 -*-
from setuptools import setup

package_dir = \
{'': 'src'}

packages = \
['pandandic']

package_data = \
{'': ['*']}

install_requires = \
['pandas>=1.2,<2.0']

extras_require = \
{':python_version < "3.11"': ['typing_extensions>=4,<5'],
 'all': ['pyarrow>=9.0.0,<10.0.0', 'pandavro>=1.7.1,<2.0.0'],
 'avro': ['pandavro>=1.7.1,<2.0.0'],
 'extras': ['pyarrow>=9.0.0,<10.0.0', 'pandavro>=1.7.1,<2.0.0'],
 'parquet': ['pyarrow>=9.0.0,<10.0.0']}

setup_kwargs = {
    'name': 'pandandic',
    'version': '0.2.0',
    'description': 'A typed dataframe helper',
    'long_description': '# pandandic\n\npandandic is a library for documenting dataset schemas in code, by inheriting from a base class and assigning attributes for columns and column sets.\n\n## Installation\n\n`pip install pandandic` or `pip install pandandic[extras]`\n\n`poetry add pandandic` or `poetry add "pandandic[extras]"`\n\n### Extras\n\n* `parquet`\n* `avro`\n* `extras` provides `parquet` and `avro`\n* `all` provides `parquet` and `avro`\n\n## What Problem Does It Solve?\n\nConsider a project that reads data from several datasets, performs some preprocessing, runs a model and returns a result.\nThe preprocessing must act on certain columns and so the team rightfully add constants in order to perform slicing on the input dataframes.\nTwo of these datasets share a column name.\nOne of the datasets consists of time series data, and each time the dataset is refreshed the number of columns changes.\nThis scenario presents several challenges with how to structure the processing logic in a clear and adaptable manner whilst maintaining clear ownership.\nHere is how `pandandic` helps:\n\n1. **Schema ownership**: with `pandandic`, each schema has a corresponding class.\n2. **Shared variables**: with `pandandic`, there are no shared constants. Each `BaseFrame` subclass is responsible for its own schema.\n3. **Dynamic groups**: with `pandandic` it is possible to define a set of columns with regular expressions. This schema will match dynamically on the data it is applied to, yet can still be accessed like an attribute.\n4. **Group processing**: with `pandandic` it is possible to define custom groups such as "all numeric", "all time-series", in order to easily apply processing tasks to groups of data in a self-documenting fashion.\n\n## Other Things It Does\n\n* Wraps `parquet` reading: `pip install pandandic[parquet]`, `poetry add "pandandic[parquet]"`\n* Wraps `avro` reading: `pip install pandandic[avro]`, `poetry add "pandandic[avro]"`  \nFor both: `pip install pandandic[all]`, `poetry add "pandandic[all]"`\n* Wraps excel reading, although there are no extras configured for this due to the various output formats of excel and different packages providing them.\n\n## What Doesn\'t It Do?\n\n* **Validation**, save for what is built in to pandas. For validation of defined types, please see other libraries such as pandera, dataenforce, strictly-typed-pandas (apologies for any I have missed).\n* **Appending columns**: if columns are appended to the object after calling `read_x` or `from_df` that should be captured by a `ColumnSet`, they won\'t be captured. This can be solved by transforming to a dataframe and back again with `to_df` and `from_df` respectively.\n* **Dask**: although support may be added in future.\n\n## Worked Examples\n\n### Basic\n\n`examples/basic.csv`\n\n|foo       |bar|baz|\n|----------|---|---|\n|a         |1  |one|\n|b         |2  |two|\n|c         |3  |three|\n\n`examples/basic_usage.py`\n\n```python\nfrom pandandic import BaseFrame, Column\n\n\nclass FooFrame(BaseFrame):\n    """\n    Each column set below will be read with the given type. Columns can be accessed like attributes to return Series\n    slices in the usual way.\n    """\n    foo = Column(type=str)\n    bar = Column(type=int)\n\n\ndata = FooFrame().read_csv("basic.csv")\nprint(data.foo)\nprint(data.bar)\n```\n\n### Intermediate\n\n`examples/intermediate.csv`\n\n|date      |temperature-0|temperature-1|temperature-2|temperature-3|temperature-4|temperature-5|\n|----------|-------------|-------------|-------------|-------------|-------------|-------------|\n|01/01/2001|23           |22           |21           |20           |19           |18           |\n|02/01/2001|24           |23           |22           |21           |20           |19           |\n|03/01/2001|25           |24           |23           |22           |21           |20           |\n|04/01/2001|26           |25           |24           |23           |22           |21           |\n\n`examples/intermediate_usage.py`\n\n```python\nimport datetime\nfrom pandandic import BaseFrame, Column, ColumnSet\n\n\nclass TemperatureFrame(BaseFrame):\n    """\n    A ColumnGroup can use a list of column names or a regex to specify multiple columns at once.\n\n    An exception is raised if members overlap, unless greedy_column_groups is set to True.\n    In that case, the first member to match is assigned that group.\n\n    A column group can be accessed like an attribute to provide a DataFrame view.\n    """\n    date = Column(type=datetime.date)\n    temperature = ColumnSet(type=float, members=["temperature-\\d+"], regex=True)\n\n\ndf = TemperatureFrame().read_csv("intermediate.csv")\ndf.set_index(TemperatureFrame.date.column_name, inplace=True)  # name attribute also works here, but column_name is recommended\nprint(df.temperature)\n\n```\n\nAs can be seen in the intermediate example, it is possible to access the defined TemperatureFrame `Column` date from the class (**not** instantiated object), and call `.name` to refer to the constant, which in this case returns "date", the name of the attribute.\n\nThis can be done as well with non-regex `ColumnSet`, in that case accessing the `.members` attribute. \n\n### Advanced\n\n`examples/advanced.csv`\n\n|date      |temperature-0|temperature-1|temperature-2|temperature-3|door-open-0|door-open-1|door-open-2|ref  |comment|\n|----------|-------------|-------------|-------------|-------------|-----------|-----------|-----------|-----|-------|\n|01/01/2001|23           |22           |21           |20           |False      |False      |False      |75   |first observation|\n|02/01/2001|24           |23           |22           |21           |False      |True       |False      |76   |       |\n|03/01/2001|25           |24           |23           |22           |True       |False      |False      |77   |left the door open|\n|04/01/2001|26           |25           |24           |23           |False      |False      |True       |78   |final observation|\n\n```python\nimport datetime\nfrom pandandic import BaseFrame, Column, ColumnSet, ColumnGroup\n\n\nclass AdvancedFrame(BaseFrame):\n    """\n    A Group can be used to group together multiple column groups and columns.\n    It can be accessed like an attribute to provide a dataframe view.\n    """\n    date = Column(type=datetime.date)\n    temperature = ColumnSet(type=float, members=["temperature-\\d+"], regex=True)\n    door_open = ColumnSet(type=bool, members=["door-open-0", "door-open-1", "door-open-2"], regex=False)\n    ref = Column(type=int)\n    comment = Column(type=str)\n\n    numerical = ColumnGroup(members=[temperature, ref])\n    time_series = ColumnGroup(members=[temperature, door_open])\n\n\ndf = AdvancedFrame().read_csv("advanced.csv")\ndf.set_index(AdvancedFrame.date.column_name, inplace=True)  # name attribute also works here, but column_name is recommended\nprint(df.time_series)\n```\n\n`ColumnGroup` and `ColumnSet` attributes can be accessed on the instantiated object, and will return a `DataFrame` view of their members.\n\n```python\n# examples/expert_usage.py\nimport datetime\n\nfrom pandandic import BaseFrame, Column, ColumnSet, ColumnGroup, DefinedLater\n\n\nclass ExpertFrame(BaseFrame):\n    """\n    Aliasing can be used to dynamically set columns or column set members at runtime.\n    """\n    date = Column(type=datetime.date, alias=DefinedLater)\n    metadata = ColumnSet(members=DefinedLater)\n\n    temperature = ColumnSet(type=float, members=["temperature-\\d+"], regex=True)\n    door_open = ColumnSet(type=bool, members=["door-open-0", "door-open-1", "door-open-2"], regex=False)\n\n    time_series = ColumnGroup(members=[temperature, door_open])\n\n\n# anything DefinedLater MUST be set before ExpertFrame reads or accesses a Column or ColumnSet via attribute\nExpertFrame.date.alias = "date"\nExpertFrame.metadata.members = ["comment", "ref"]\n\ndf = ExpertFrame().read_csv("advanced.csv")\ndf.set_index(ExpertFrame.date.column_name, inplace=True)  # now sets index with the defined alias\nprint(df.metadata)\n\n```\n\n`Column` alias can be set as `DefinedLater` to clearly document that it is set dynamically at runtime. \nThe same is possible for `ColumnSet` members. This has the benefit of adding a runtime check that the alias or members are set before being used.\n\n**Warning**: If a `Column` alias is set, it will be used **regardless** of whether it exists in the data or not. \n\n## Class Diagram\n\n```mermaid\nclassDiagram\n    \n    DataFrame <|-- BaseFrame\n    class BaseFrame {\n        +int enforce_types\n        +int enforce_columns\n        +int allow_extra_columns\n        +int greedy_column_sets\n        +with_enforced_types()\n        +with_enforced_columns()\n        +with_allowed_extra_columns()\n        +with_greedy_column_sets()\n        +read_csv()\n        +read_excel()\n        +read_parquet()\n        +read_avro()\n        +from_df()\n        +to_df()\n        +read_csv_columns()\n        +read_excel_columns()\n        +read_parquet_columns()\n        +read_avro_columns()\n    }\n    BaseFrame o-- Column\n    class Column {\n        +type\n    }\n    BaseFrame o-- ColumnSet\n    class ColumnSet {\n        +type\n        +members\n    }\n    BaseFrame o-- ColumnGroup\n    class ColumnGroup {\n        +type\n        +members\n    }\n    ColumnGroup *--\tColumnSet\n    ColumnGroup *--\tColumn\n```\n\n## Defined Behaviours\n\n### enforce_types\n\nIf set to True (default), the types set in `Column` and `ColumnSet` attributes are enforced at read time (csv, excel) or cast after reading (parquet, avro, df).\nNo validation is done, so errors **will** be thrown by pandas if the data cannot be coerced to the schema.\n\n### enforce_columns\n\nIf set to True (default), defined `Column` and `ColumnSet` attributes define the mandatory columns of the frame.\nErrors **will** be thrown by pandas if expected columns do not exist in the data.  \n\nA regex `ColumnSet` will match only existing columns, and will not error if a match doesn\'t exist.\n\n### allow_extra_columns\n\nIf set to False (default), any extra columns will be removed.\n\nIf set to True (not default), they will remain.\n\n### greedy_column_sets\n\nIf set to False (default), there must be no overlap in `Column` and `ColumnSet` members.\nIf there is an overlap, a `ColumnSetException` will be raised.\n\nIf set to True (not default), a `ColumnSet` will "consume" columns, they will belong to that `ColumnSet` and inherit its defined type, and the system will not raise a `ColumnSetException`.\n',
    'author': 'Will Martin',
    'author_email': 'will.st4@gmail.com',
    'maintainer': 'None',
    'maintainer_email': 'None',
    'url': 'https://github.com/w-martin/pandandic',
    'package_dir': package_dir,
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'extras_require': extras_require,
    'python_requires': '>=3.8,<4.0',
}


setup(**setup_kwargs)
