Metadata-Version: 2.1
Name: dataprep
Version: 0.2.3
Summary: Dataprep: Data Preparation in Python
Home-page: https://github.com/sfu-db/dataprep
License: MIT
Keywords: dataprep,eda,data connector,data science,exploratory data analysis,data exploration
Author: SFU Database System Lab
Author-email: dsl.cs.sfu@gmail.com
Maintainer: Weiyuan Wu
Maintainer-email: youngw@sfu.com
Requires-Python: >=3.6.1,<4.0.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Dist: bokeh (>=2.0,<2.1)
Requires-Dist: dask[complete] (>=2.13,<2.14)
Requires-Dist: holoviews (>=1.13,<1.14)
Requires-Dist: jinja2 (>=2.11,<2.12)
Requires-Dist: jsonpath2 (>=0.4,<0.5)
Requires-Dist: jsonschema (>=3.2,<3.3)
Requires-Dist: lxml (>=4.5,<4.6)
Requires-Dist: numpy (>=1.18,<1.19)
Requires-Dist: pandas (>=1.0,<1.1)
Requires-Dist: requests (>=2.23,<2.24)
Requires-Dist: scipy (>=1.4,<1.5)
Project-URL: Repository, https://github.com/sfu-db/dataprep
Description-Content-Type: text/markdown

# Dataprep ![Build Status]
[Documentation] | [Mail List & Forum] 

Dataprep let you prepare your data using a single library with a few lines of code.

Currently, you can use `dataprep` to:
* Collect data from common data sources (through `dataprep.data_connector`)
* Do your exploratory data analysis (through `dataprep.eda`)
* ...more modules are coming

## Installation

```bash
pip install dataprep
```

## Examples & Usages

The following examples can give you an impression of what dataprep can do:

* [Documentation: Data Connector](https://sfu-db.github.io/dataprep/data_connector.html)
* [Documentation: EDA](https://sfu-db.github.io/dataprep/eda/introduction.html)
* [EDA Case Study: Titanic](https://sfu-db.github.io/dataprep/case_study/titanic.html)
* [EDA Case Study: House Price](https://sfu-db.github.io/dataprep/case_study/house_price.html)

### EDA

There are common tasks during the exploratory data analysis stage, 
like a quick look at the columnar distribution, or understanding the correlations
between columns. 

The EDA module categorizes these EDA tasks into functions helping you finish EDA
tasks with a single function call.

* Want to understand the distributions for each DataFrame column? Use `plot`.

<center><a href="https://sfu-db.github.io/dataprep/eda/introduction.html#analyzing-basic-characteristics-via-plot"><img src="https://github.com/sfu-db/dataprep/raw/master/assets/plot(df).png"/></a></center>

* Want to understand the correlation between columns? Use `plot_correlation`.

<center><a href="https://sfu-db.github.io/dataprep/eda/introduction.html#analyzing-correlation-via-plot-correlation"><img src="https://github.com/sfu-db/dataprep/raw/master/assets/plot_correlation(df).png"/></a></center>

* Or, if you want to understand the impact of the missing values for each column, use `plot_missing`.

<center><a href="https://sfu-db.github.io/dataprep/eda/plot_missing.html#plotting-the-position-of-missing-values-via-plot-missing-df"><img src="https://github.com/sfu-db/dataprep/raw/master/assets/plot_missing(df).png"/></a></center>

* You can drill down to get more information by given `plot`, `plot_correlation` and `plot_missing` a column name. E.g. for `plot_missing`:

<center><a href="https://sfu-db.github.io/dataprep/eda/plot_missing.html#the-impact-on-basic-characteristics-of-missing-values-in-column-x-via-plot-missing-df-x"><img src="https://github.com/sfu-db/dataprep/raw/master/assets/plot_missing(df,x).png"/></a></center>

Don't forget to checkout the [examples] folder for detailed demonstration!

### Data Connector

You can download Yelp business search result into a pandas DataFrame, 
using two lines of code, without taking deep looking into the Yelp documentation!

```python
from dataprep.data_connector import Connector

dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="korean", location="seattle")
```
<center><a href="https://sfu-db.github.io/dataprep/data_connector.html#getting-web-data-with-connector-query"><img src="https://github.com/sfu-db/dataprep/raw/master/assets/data_connector.png"/></a></center>


## Contribution

Dataprep is in its early stage. Any contribution including:
* Filing an issue
* Providing use cases
* Writing down your user experience
* Submitting a PR
* ...

are greatly appreciated!

Please take a look at our [wiki] for development documentations!


[Build Status]: https://img.shields.io/circleci/build/github/sfu-db/dataprep/master?style=flat-square&token=f68e38757f5c98771f46d1c7e700f285a0b9784d
[Documentation]: https://sfu-db.github.io/dataprep/
[Mail list & Forum]: https://groups.google.com/forum/#!forum/dataprep
[wiki]: https://github.com/sfu-db/dataprep/wiki
[examples]: https://github.com/sfu-db/dataprep/tree/master/examples

