# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['mealprep']

package_data = \
{'': ['*']}

install_requires = \
['codecov>=2.0.16,<3.0.0',
 'flake8>=3.7.9,<4.0.0',
 'pandas>=1.0.1,<2.0.0',
 'python-semantic-release>=4.10.0,<5.0.0',
 'sklearn>=0.0,<0.1',
 'sphinxcontrib-napoleon>=0.7,<0.8',
 'vega_datasets>=0.8.0,<0.9.0']

setup_kwargs = {
    'name': 'mealprep',
    'version': '1.0.13',
    'description': 'Python package that ease the pain in pre-processing like outlier finding, numerical/categorical data and etc.',
    'long_description': 'mealprep\n================\n\n![build mealprep\npackage](https://github.com/UBC-MDS/mealprep/workflows/build%20mealprep%20package/badge.svg)\n![Release](https://github.com/UBC-MDS/mealprep/workflows/Release/badge.svg)\n[![codecov](https://codecov.io/gh/UBC-MDS/mealprep/branch/master/graph/badge.svg)](https://codecov.io/gh/UBC-MDS/mealprep)\n[![Documentation\nStatus](https://readthedocs.org/projects/mealprep/badge/?version=latest)](https://mealprep.readthedocs.io/en/latest/?badge=latest)\n\nMealprep offers a toolkit, made with care, to help users save time in\nthe data preprocessing kitchen.\n\n## Overview\n\nRecognizing that the preparation step of a data science project often\nrequires the most time and effort, `mealprep` aims to help data science\nchefs of all specialties master their recipes of analysis. This package\ntackles pesky tasks such as classifying columns as categorical or\nnumeric ingredients, straining NA values and outliers, and automating a\npreprocessing recipe pipeline.\n\n## Functions\n\n`find_fruits_veg()`: This function will drop rows with NAs and find the\nindices of columns with all numeric values or categorical values based\non the specification.\n\n`find_missing_ingredients()`: For each column with missing values, this\nfunction will create a reference list of row indices, sum the number,\nand calculate the proportion of missing values.\n\n`find_bad_apples()`: This function uses a univariate approach to outlier\ndetection. For each column with outliers (values that are 2 or more\nstandard deviations from the mean), this function will create a\nreference list of row indices with outliers, and the total number of\noutliers in that column.\n\n`make_recipe()`: This function is used to quickly apply the following\ncommon data preprocessing techniques with one line of code: split the\ndataset into a training set and testing set, apply standard scaling to\nnumeric features, apply one-hot-encoding to categorical features, fit\nand transform training data, and fit testing data.\n\n## Mealprep and Python’s Ecosystem\n\n**mealprep** complements many of the existing packages in the Python\necosystem around the theme of data preprocessing. When preparing a\ndataframe for a machine learning preprocessing pipeline, it is time\nconsuming to manually note which columns are categorical and numerical,\nparticularly for large datasets. The\n[pandas](https://pypi.org/project/pandas/) function `df.select_dtypes()`\ncomes close by allowing users to select columns with data corresponding\nto specific data types however the output of this function is a pandas\ndataframe. `find_fruits_veg()` aims to fill this void by producing a\nlist of columns corresponding to the categorical and numerical groups.\n\nIn terms of missing values, [pandas](https://pypi.org/project/pandas/)\npackage’s `isna()` function converts all elements of a pandas.dataframe\nor pandas.series to boolean values representing if they are missing\nvalues. The package\n[autoimpute](https://autoimpute.readthedocs.io/en/latest/) provides a\nsuite of tools to fill missing values in a dataset through multiple\nunivariate, multivariate and time series methods. The gap between these\npackages is that neither provides you a summary of the missing values\nincluding the list of indices where they occur.\n`find_missing_ingredients()` augments these tools by providing a summary\ndataframe detailing which columns have missing values, as well as their\ncount and proportion.\n\nThe [pandas](https://pypi.org/project/pandas/) package’s `describe()`\nfunction is a staple in the data wrangling process because it returns\nseveral summary statistics for each numeric column in a dataframe, such\nas the mean, standard deviation, minimum, and maximum. Viewing these\nstatistics together is helpful for detecting outliers. However, the\noutput of this function does not tell you which rows of data these\noutliers are found in, or how many outliers are present in the\ndataframe. Packages like the\n[PyOD](https://pyod.readthedocs.io/en/latest/) toolkit and other\nfunctions that use clustering methods consider all variables at once to\ndetect outliers for multivariate data.\n[PyOD](https://pyod.readthedocs.io/en/latest/) provides over 20\nalgorithms to select from in detecting these outliers, which is handy\nfor large multivariate datasets where you know you want to consider all\nfeatures in detecting outliers, but can be a bit extreme for initial\ndata exploration. The **mealprep** `find_bad_apples()` function lives\nhappily in the space between [pandas](https://pypi.org/project/pandas/)\nand [PyOD](https://pyod.readthedocs.io/en/latest/)-type solutions for\noutlier detection, where it provides more information than the\n[pandas](https://pypi.org/project/pandas/) `describe()` function to\npoint out datapoints which need further investigation, but does not\nconsider all variables at once like the\n[PyOD](https://pyod.readthedocs.io/en/latest/)-type functions do.\n\nLastly, there are many great tools in the data science ecosystem for\npre-processing data such as [scikit-learn\npreprocessing](https://scikit-learn.org/stable/modules/preprocessing.html)\nin Python. However, you may find yourself frequently writing the same\nlengthy code for common preprocessing tasks (e.g scale numeric features\nand one hot encode categorical features). `preprocess_recipe()` provides\na *shortcut function* to apply your favourite recipes quickly to\npreprocess data in one line of code.\n\n## Installation:\n\n    pip install -i https://test.pypi.org/simple/ mealprep\n\n## Examples\n\n### `find_fruits_veg()`\n\nFind the column indices for either numerical or categorical variables in\nyour dataframe with the `find_fruits_veg()` function. The example below\nshows how to use find\\_fruits\\_veg() to find the index of the\ncategorical column in a toy dataframe.\n\nFirst, load the required packages.\n\n``` python\nfrom mealprep.mealprep import find_fruits_veg\nimport pandas as pd\n```\n\nIf you don’t already have a dataframe to work with, run this code to set\nup a toy dataframe (`df`) for testing.\n\n``` python\ndf = pd.DataFrame({\'col1\': [1, 2], \'col2\': [\'a\', \'b\']})\ndf\n```\n\n    ##    col1 col2\n    ## 0     1    a\n    ## 1     2    b\n\nThen, apply the `find_fruits_veg()` function to the dataframe.\n\n``` python\nfind_fruits_veg(df, type_of_out = \'categ\')\n```\n\n    ## [1]\n\n### `find_missing_ingredients()`\n\nBefore launching into a new data analysis, running the function\n`find_missing_ingredients()` on a dataframe of interest will produce a\nreport on each column with missing values.\n\nFirst, load the required packages\n\n``` python\nfrom mealprep.mealprep import find_missing_ingredients\nimport pandas as pd\nimport numpy as np\n```\n\nIf you don’t already have a dataframe to work with, run this code to set\nup a toy dataframe (`df`) for testing.\n\n``` python\ntest1= {\'column1\': [\'a\', \'b\', \'c\', \'d\'],\n       \'column2\': [1, 2, np.NaN, 3],\n       \'column3\': [np.NaN] * 4}\n   \ndf = pd.DataFrame(test1)\ndf\n```\n\n    ##   column1  column2  column3\n    ## 0       a      1.0      NaN\n    ## 1       b      2.0      NaN\n    ## 2       c      NaN      NaN\n    ## 3       d      3.0      NaN\n\nThen, apply the `find_missing_ingredients()` function to the dataframe.\n\n``` python\nfind_missing_ingredients(df)\n```\n\n    ##   Column name  NaN count NaN proportion   NaN indices\n    ## 0     column2          1          25.0%           [2]\n    ## 1     column3          4         100.0%  [0, 1, 2, 3]\n\n### `find_bad_apples()`\n\nFind the outliers in your data by applying the `find_bad_apples()`\nfunction to your dataframe.\n\nFirst, load the required packages.\n\n``` python\nfrom mealprep.mealprep import find_bad_apples\nimport pandas as pd\n```\n\nIf you don’t already have a dataframe to work with, run this code to set\nup a toy dataframe (`df`) for testing.\n\n``` python\ndf = pd.DataFrame({\'A\' : [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,\n                             1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],\n                    \'B\' : [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-100,\n                             1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,100],\n                    \'C\' : [1,1,1,1,1,19,1,1,1,1,1,1,1,1,19,1,1,1,1,\n                             1,1,1,1,1,1,1,19,1,1,1,1,1,1,1,1]})\ndf\n```\n\n    ##     A    B   C\n    ## 0   1    1   1\n    ## 1   1    1   1\n    ## 2   1    1   1\n    ## 3   1    1   1\n    ## 4   1    1   1\n    ## 5   1    1  19\n    ## 6   1    1   1\n    ## 7   1    1   1\n    ## 8   1    1   1\n    ## 9   1    1   1\n    ## 10  1    1   1\n    ## 11  1    1   1\n    ## 12  1    1   1\n    ## 13  1    1   1\n    ## 14  1    1  19\n    ## 15  1    1   1\n    ## 16  1    1   1\n    ## 17  1 -100   1\n    ## 18  1    1   1\n    ## 19  1    1   1\n    ## 20  1    1   1\n    ## 21  1    1   1\n    ## 22  1    1   1\n    ## 23  1    1   1\n    ## 24  1    1   1\n    ## 25  1    1   1\n    ## 26  1    1  19\n    ## 27  1    1   1\n    ## 28  1    1   1\n    ## 29  1    1   1\n    ## 30  1    1   1\n    ## 31  1    1   1\n    ## 32  1    1   1\n    ## 33  1    1   1\n    ## 34  1  100   1\n\nThen, apply the `find_bad_apples()` function to the dataframe.\n\n``` python\nfind_bad_apples(df)\n```\n\n    ##   Variable      Indices Total Outliers\n    ## 0        B     [17, 34]              2\n    ## 1        C  [5, 14, 26]              3\n\n### `make_recipe()`\n\nDo you find yourself constantly applying the same data preprocessing\ntechniques time and time again? `make_recipe` can help by applying your\nfavourite preprocessing recipes in only a few lines of code.\n\nBelow `make_recipe` applies the following common recipe in only one line\nof code:\n\n1.  Split data into training, validation, and testing\n2.  Standardise and scale numeric features\n3.  One hot encode categorical features\n\nFirst, load the required packages.\n\n``` python\nfrom mealprep.mealprep import make_recipe\nimport pandas as pd\nimport numpy as np\nfrom vega_datasets import data\n```\n\nIf you don’t already have a dataframe to work with, run this code to\nload the classic `mtcars` dataset for testing.\n\n``` python\ndf = pd.read_json(data.cars.url).drop(columns=["Year"])\nX = df.drop(columns=["Name"])\ny = df[["Name"]]\n    \ndf.info()\n```\n\n    ## <class \'pandas.core.frame.DataFrame\'>\n    ## RangeIndex: 406 entries, 0 to 405\n    ## Data columns (total 8 columns):\n    ##  #   Column            Non-Null Count  Dtype  \n    ## ---  ------            --------------  -----  \n    ##  0   Name              406 non-null    object \n    ##  1   Miles_per_Gallon  398 non-null    float64\n    ##  2   Cylinders         406 non-null    int64  \n    ##  3   Displacement      406 non-null    float64\n    ##  4   Horsepower        400 non-null    float64\n    ##  5   Weight_in_lbs     406 non-null    int64  \n    ##  6   Acceleration      406 non-null    float64\n    ##  7   Origin            406 non-null    object \n    ## dtypes: float64(4), int64(2), object(2)\n    ## memory usage: 25.5+ KB\n\nThen, use `make_recipe` to quickly apply split your data and apply your\nfavourite preprocessing techniques\\!\n\n``` python\nX_train, X_valid, X_test, y_train, y_valid, y_test = make_recipe(\n    X=X, y=y, recipe="ohe_and_standard_scaler", \n    splits_to_return="train_test")\n\nX_train.head()\n```\n\n    ##    Miles_per_Gallon  Cylinders  Displacement  ...  x0_Europe  x0_Japan  x0_USA\n    ## 0          0.564509  -0.846151     -0.910090  ...        0.0       0.0     1.0\n    ## 1          0.883582  -0.846151     -0.910090  ...        0.0       0.0     1.0\n    ## 2          1.126078  -0.846151     -0.815709  ...        0.0       1.0     0.0\n    ## 3         -1.094674   0.308177      0.524498  ...        0.0       0.0     1.0\n    ## 4          0.794242  -0.846151     -0.995032  ...        1.0       0.0     0.0\n    ## \n    ## [5 rows x 9 columns]\n\n### Documentation\n\nThe official documentation is hosted on Read the Docs:\n<https://mealprep.readthedocs.io/en/latest/>\n\n### Credits\n\nThis package was created with Cookiecutter and the\nUBC-MDS/cookiecutter-ubc-mds project template, modified from the\n[pyOpenSci/cookiecutter-pyopensci](https://github.com/pyOpenSci/cookiecutter-pyopensci)\nproject template and the\n[audreyr/cookiecutter-pypackage](https://github.com/audreyr/cookiecutter-pypackage).\n',
    'author': 'luhuayue',
    'author_email': 'luhuayueapp@163.com',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/UBC-MDS/mealprep',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.6.1,<4.0.0',
}


setup(**setup_kwargs)
