Metadata-Version: 2.1
Name: foreshadow
Version: 1.0.0
Summary: Peer into the future of a data science project
Home-page: https://foreshadow.readthedocs.io
License: Apache-2.0
Keywords: feature,machine,learning,automl,foreshadow
Author: Adithya Balaji
Author-email: adithyabsk@gmail.com
Requires-Python: >=3.6,<4.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: doc
Requires-Dist: TPOT (>=0.11.0,<0.12.0)
Requires-Dist: category-encoders (>=1.2.8,<2.0.0)
Requires-Dist: docutils (<0.15.1); extra == "doc"
Requires-Dist: fancyimpute (>=0.3.2,<0.4.0)
Requires-Dist: hyperopt (>=0.1.2,<0.2.0)
Requires-Dist: jsonpickle (>=1.2,<2.0)
Requires-Dist: marshmallow (>=2.19.5,<3.0.0)
Requires-Dist: numpy (>=1.16.4,<2.0.0)
Requires-Dist: pandas (>=0.25.0,<0.26.0)
Requires-Dist: patchy (>=1.5,<2.0)
Requires-Dist: pyyaml (>=5.1,<6.0)
Requires-Dist: scikit-learn (>=0.22.1,<0.23.0)
Requires-Dist: scipy (>=1.1.0,<2.0.0)
Requires-Dist: scs (<=2.1.0)
Requires-Dist: sphinx (>=1.7.6,<2.0.0); extra == "doc"
Requires-Dist: sphinx_rtd_theme (>=0.4.1,<0.5.0); extra == "doc"
Requires-Dist: sphinxcontrib-plantuml (>=0.16.1,<0.17.0); extra == "doc"
Requires-Dist: toml (>=0.10.0,<0.11.0)
Project-URL: Documentation, https://foreshadow.readthedocs.io
Project-URL: Repository, https://github.com/georgianpartners/foreshadow
Description-Content-Type: text/x-rst

Foreshadow: Simple Machine Learning Scaffolding
===============================================

|BuildStatus| |DocStatus| |Coverage| |CodeStyle| |License|

Foreshadow is an automatic pipeline generation tool that makes creating, iterating,
and evaluating machine learning pipelines a fast and intuitive experience allowing
data scientists to spend more time on data science and less time on code.

.. |BuildStatus| image:: https://dev.azure.com/georgianpartners/foreshadow/_apis/build/status/georgianpartners.foreshadow?branchName=master
   :target: https://dev.azure.com/georgianpartners/foreshadow/_build/latest?definitionId=1&branchName=master

.. |DocStatus| image:: https://readthedocs.org/projects/foreshadow/badge/?version=latest
  :target: https://foreshadow.readthedocs.io/en/latest/?badge=latest
  :alt: Documentation Status

.. |Coverage| image:: https://img.shields.io/azure-devops/coverage/georgianpartners/foreshadow/1.svg
  :target: https://dev.azure.com/georgianpartners/foreshadow/_build/latest?definitionId=1&branchName=master
  :alt: Coverage

.. |CodeStyle| image:: https://img.shields.io/badge/code%20style-black-000000.svg
  :target: https://github.com/ambv/black
  :alt: Code Style

.. |License| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
  :target: https://github.com/georgianpartners/foreshadow/blob/master/LICENSE
  :alt: License

Key Features
------------
- Scikit-Learn compatible
- Automatic column intent inference
    - Numerical
    - Categorical
    - Text
    - Droppable (All values in a column are either the same or different)
- Allow user override on column intent and transformation functions
- Automatic feature preprocessing depending on the column intent type
    - Numerical: imputation followed by scaling
    - Categorical: a variety of categorical encoding
    - Text: TFIDF followed by SVD
- Automatic model selection
- Rapid pipeline development / iteration

Features in the road map
------------------------
- Automatic feature engineering
- Automatic parameter optimization

Foreshadow supports python 3.6+

Installing Foreshadow
---------------------

.. code-block:: console

    $ pip install foreshadow

Read the documentation to `set up the project from source`_.

.. _set up the project from source: https://foreshadow.readthedocs.io/en/development/developers.html#setting-up-the-project-from-source

Getting Started
---------------

To get started with foreshadow, install the package using pip install. This will also
install the dependencies. Now create a simple python script that uses all the
defaults with Foreshadow.

First import foreshadow

.. code-block:: python

    from foreshadow.foreshadow import Foreshadow
    from foreshadow.estimators import AutoEstimator
    from foreshadow.utils import ProblemType

Also import sklearn, pandas, and numpy for the demo

.. code-block:: python

    import pandas as pd

    from sklearn.datasets import boston_housing
    from sklearn.model_selection import train_test_split

Now load in the boston housing dataset from sklearn into pandas dataframes. This
is a common dataset for testing machine learning models and comes built in to
scikit-learn.

.. code-block:: python

    boston = load_boston()
    bostonX_df = pd.DataFrame(boston.data, columns=boston.feature_names)
    bostony_df = pd.DataFrame(boston.target, columns=['target'])

Next, exactly as if working with an sklearn estimator, perform a train test
split on the data and pass the train data into the fit function of a new Foreshadow
object

.. code-block:: python

    X_train, X_test, y_train, y_test = train_test_split(bostonX_df,
       bostony_df, test_size=0.2)

    problem_type = ProblemType.REGRESSION

    estimator = AutoEstimator(
        problem_type=problem_type,
        auto="tpot",
        estimator_kwargs={"max_time_mins": 1},
    )
    shadow = Foreshadow(estimator=estimator, problem_type=problem_type)
    shadow.fit(X_train, y_train)

Now `fs` is a fit Foreshadow object for which all feature engineering has been
performed and the estimator has been trained and optimized. It is now possible to
utilize this exactly as a fit sklearn estimator to make predictions.

.. code-block:: python

    shadow.score(X_test, y_test)

Great, you now have a working Foreshaow installation! Keep reading to learn how to
export, modify and construct pipelines of your own.

Tutorial
------------
We also have a jupyter notebook tutorial to go through more details under the `examples` folder.

Documentation
-------------
`Read the docs!`_

.. _Read the docs!: https://foreshadow.readthedocs.io/en/development/index.html

