Metadata-Version: 2.1
Name: mltree
Version: 0.0.2
Summary: A machine learning package only for tree based models
Home-page: https://github.com/joaopcnogueira/mltree/
Author: João Nogueira
Author-email: joao.nogueira@datarisk.io
License: Apache Software License 2.0
Keywords: nbdev
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: dev
License-File: LICENSE

mltree
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

`pip install mltree`

## How to use

First, load the analytical base table:

``` python
from mltree.train import train_tree_models
```

``` python
import pandas as pd
from pathlib import Path

path = Path('..')
datasets_path = path/'datasets'

df = pd.read_csv(datasets_path/'churn_abt.csv')
```

``` python
df.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>data_ref_safra</th>
      <th>seller_id</th>
      <th>uf</th>
      <th>tot_orders_12m</th>
      <th>tot_items_12m</th>
      <th>tot_items_dist_12m</th>
      <th>receita_12m</th>
      <th>recencia</th>
      <th>nao_revendeu_next_6m</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>2018-01-01</td>
      <td>0015a82c2db000af6aaaf3ae2ecb0532</td>
      <td>SP</td>
      <td>3</td>
      <td>3</td>
      <td>1</td>
      <td>2685.00</td>
      <td>74</td>
      <td>1</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2018-01-01</td>
      <td>001cca7ae9ae17fb1caed9dfb1094831</td>
      <td>ES</td>
      <td>171</td>
      <td>207</td>
      <td>9</td>
      <td>21275.23</td>
      <td>2</td>
      <td>0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2018-01-01</td>
      <td>002100f778ceb8431b7a1020ff7ab48f</td>
      <td>SP</td>
      <td>38</td>
      <td>42</td>
      <td>15</td>
      <td>781.80</td>
      <td>2</td>
      <td>0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2018-01-01</td>
      <td>003554e2dce176b5555353e4f3555ac8</td>
      <td>GO</td>
      <td>1</td>
      <td>1</td>
      <td>1</td>
      <td>120.00</td>
      <td>16</td>
      <td>1</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2018-01-01</td>
      <td>004c9cd9d87a3c30c522c48c4fc07416</td>
      <td>SP</td>
      <td>130</td>
      <td>141</td>
      <td>75</td>
      <td>16228.88</td>
      <td>8</td>
      <td>0</td>
    </tr>
  </tbody>
</table>
</div>

Split into train and test or out of time datasets:

``` python
df_train = df.query('data_ref_safra < "2018-03-01"')
df_oot = df.query('data_ref_safra == "2018-03-01"')
```

Get features metadata and types:

``` python
key_vars = ['data_ref_safra', 'seller_id']
target = 'nao_revendeu_next_6m'
num_vars = [ var for var in df.select_dtypes(include='number').columns.tolist() if var not in [target] ]
cat_vars = [var for var in df.select_dtypes(exclude='number').columns.tolist() if var not in key_vars]
```

Train based tree models:

``` python
train_tree_models(df_train, df_oot, target=target, folds=5, cat_features=cat_vars, num_features=num_vars, seed=42)
```

    {'dt': {'auc': {'train': 0.9139680595991275, 'test': 0.8968114296299949}},
     'rf': {'auc': {'train': 0.9072972070544887, 'test': 0.8964968670043654}}}


