Metadata-Version: 2.1
Name: analyticsdf
Version: 0.0.7
Summary: Analytic generation of datasets with specified statistical characteristics.
Home-page: https://github.com/Faye-yufan/analytics-dataset
Author: Fei, Eli
Author-email: yufanfei@usc.edu
License: MIT
Platform: unix
Platform: linux
Platform: osx
Platform: cygwin
Platform: win32
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Provides-Extra: testing
License-File: LICENSE

Analytic generation of datasets with specified statistical characteristics.

# Introduction
analytics-dataset provides a set of functionality to enable the specification and generation of a wide range of datasets with specified statistical characteristics. Specification to include the predictor matrix and the response vector. Check the [analyticsdf documentation](https://faye-yufan.github.io/analytics-dataset/) for more details.
Examples include:
* High correlation and multi-collinearity among predictor variables
* Interaction effects between variables
* Skewed distributions of predictor and response variables
* Nonlinear relationships between predictor and response variables

## Research existing automate dataset functionality
* Sklearn [Make Datasets](https://scikit-learn.org/stable/datasets/sample_generators.html) functionality
* MIT Synthetic Data Vault project
  * [MIT Data to AI Lab](https://dai.lids.mit.edu/)
  * [datacebo](https://datacebo.com/)
  * 2016 IEEE conference paper, The Synthetic Data Vault. 

## Public Package
This repo has published beta packages on both [Pypi](https://pypi.org/project/analyticsdf/) and [Anaconda](https://anaconda.org/faye-yufan/analyticsdf)
