# Feature Engine

![Python 3.6](https://img.shields.io/badge/python-3.6-success.svg)
![Python 3.7](https://img.shields.io/badge/python-3.7-success.svg)
![Python 3.8](https://img.shields.io/badge/python-3.8-success.svg)
![License](https://img.shields.io/badge/license-BSD-success.svg)
![CircleCI](https://img.shields.io/circleci/build/github/solegalli/feature_engine/master.svg?token=5a1c2accc2c97450e52d2cb1b47c333ab495d2c2)
![Documentation Status](https://readthedocs.org/projects/feature-engine/badge/?version=latest)


Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.


## Feature-engine features in the following resources:

* [Feature Engineering for Machine Learning, Online Course](https://www.udemy.com/feature-engineering-for-machine-learning/?couponCode=FEATENGREPO).

* [Python Feature Engineering Cookbook](https://www.packtpub.com/data/python-feature-engineering-cookbook)

## Blogs about Feature-engine:

* [Feature-engine: A new open-source Python package for feature engineering](https://www.trainindata.com/post/feature-engine-a-new-open-source-python-package-for-feature-engineering)

* [Open-Source Python libraries for Feature Engineering: Comparisons and Walkthroughs](https://www.trainindata.com/post/feature-engineering-python-libraries-comparisons)

## Documentation

* Documentation: http://feature-engine.readthedocs.io
* Home page: https://www.trainindata.com/feature-engine


## Current Feature-engine's transformers include functionality for:

* Missing Data Imputation
* Categorical Variable Encoding
* Outlier Removal
* Discretisation
* Numerical Variable Transformation

### Imputing Methods

* MeanMedianImputer
* RandomSampleImputer
* EndTailImputer
* AddNaNBinaryImputer
* CategoricalVariableImputer
* FrequentCategoryImputer
* ArbitraryNumberImputer

### Encoding Methods
* CountFrequencyCategoricalEncoder
* OrdinalCategoricalEncoder 
* MeanCategoricalEncoder
* WoERatioCategoricalEncoder
* OneHotCategoricalEncoder
* RareLabelCategoricalEncoder

### Outlier Handling methods
* Winsorizer
* ArbitraryOutlierCapper
* OutlierTrimmer

### Discretisation methods
* EqualFrequencyDiscretiser
* EqualWidthDiscretiser
* DecisionTreeDiscretiser
* UserInputDiscreriser

### Variable Transformation methods
* LogTransformer
* ReciprocalTransformer
* PowerTransformer
* BoxCoxTransformer
* YeoJohnsonTransformer


### Scikit-learn Wrapper:

 * SklearnTransformerWrapper


### Installing

```
pip install feature_engine
```
or

```
git clone https://github.com/solegalli/feature_engine.git
```

### Usage

```python
>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
```

```
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
```
    
```python 
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
```

```
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64
```

See more usage examples in the Jupyter Notebooks in the **example** folder of this repository, or in the documentation: http://feature-engine.readthedocs.io

## Contributing

### Local Setup Steps
- Clone the repo and cd into it
- Run `pip install tox`
- Run `tox` if the tests pass, your local setup is complete

### Opening Pull Requests
PR's are welcome! Please make sure the CI tests pass on your branch.

## License

BSD 3-Clause

## Authors

* **Soledad Galli** - *Initial work* - [Feature Engineering for Machine Learning, Online Course](https://www.udemy.com/feature-engineering-for-machine-learning/?couponCode=FEATENGREPO).


### References

Many of the engineering and encoding functionalities are inspired by this [series of articles from the 2009 KDD Competition](http://www.mtome.com/Publications/CiML/CiML-v3-book.pdf).

To learn more about the rationale, functionality, pros and cons of each imputer, encoder, and transformer, refer to the [Feature Engineering for Machine Learning, Online Course](https://www.udemy.com/feature-engineering-for-machine-learning/?couponCode=FEATENGREPO)

For a summary of the methods check this [presentation](https://speakerdeck.com/solegalli/engineering-and-selecting-features-for-machine-learning) and this [article](https://www.trainindata.com/post/feature-engineering-comprehensive-overview)

To stay alert of latest releases, sign up at [trainindata](https://www.trainindata.com)
