# ad-hoc-boost
Welcome to ad-hoc-boost--a model that is specialized for classification in a severely imbalanced-class scenario.

## About
Many data science problems have severely imbalanced classes (e.g. predicting fraudulent transactions, predicting
order-cancellations in food-delivery, predicting if a day in Berlin will be sunny). In these situations, predicting the
positive class is hard! This module aims to alleviate some of that.

The `AdHocBoost` model works by creating `n` sequential models. The first `n-1` models can most aptly be thought of
as dataset filtering models, i.e. each one does a good job at classifying rows as "definitely _not_ the positive class"
versus "maybe the positive class". The `nth` model only works on this filtered "maybe positive" data.

Like this, the class imbalance is alleviated at each filter-step, such that by the time the dataset is filtered for
final classification by the `nth` model, the classes are considerably more balanced.

## Run Instructions
1. Clone this module to a location of your choice.
2. Set an environment variable in your `src` file of choice (e.g. `~/.zshrc` or `~/.bash_profile`) corresponding to the
   location of where you cloned the module. It should read something like
   `export AD_HOC_BOOST_HOME="path/to/your/ad_hoc_boost"`.
3. Use the herein contained `env.yml` file to create an environment, by running
   `conda env create --file env.yml --prefix $AD_HOC_BOOST_HOME/env`. This can take some time, as much as ~15
   minutes, as some dependencies are large and difficult to resolve (e.g. `google-cloud`).
4. Activate your new environment with `conda activate $AD_HOC_BOOST_HOME/env`.
   It probably works similarly with pip--we leave that as an exercise for the reader ;)
4. To see an example in action, check out the file at `./scripts/example.py`.
5. To run `./scripts/example.py`, you'll need to hit bigquery! Make sure that you have a 
   `GOOGLE_CLOUD_PROJECT=<your-project-here>` configured in your `src` file of choice.
   
## Other Notes and Documentation
The `AdHocBoost` conforms to a sklearn-like API: to use it, you simply instantiate it, and then use
`.fit()`, `.predict()`, and `.predict_proba()` as you see... fit ;).