Metadata-Version: 2.1
Name: benderml
Version: 0.1.0
Summary: A Python package that makes ML processes easier, faster and less error prone
Home-page: https://github.com/otovo/bender
License: Apache-2.0
Keywords: python,typed,ml,prediction
Author: Mats E. Mollestad
Author-email: mats@mollestad.no
Requires-Python: >=3.9.7,<4.0.0
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: aioaws (>=0.12,<0.13)
Requires-Dist: asyncpg (>=0.24.0,<0.25.0)
Requires-Dist: databases (>=0.5.3,<0.6.0)
Requires-Dist: gspread (>=4.0.1,<5.0.0)
Requires-Dist: matplotlib (>=3.4.3,<4.0.0)
Requires-Dist: pandas (>=1.3.4,<2.0.0)
Requires-Dist: plotly (>=5.3.1,<6.0.0)
Requires-Dist: pydantic (>=1.8.2,<2.0.0)
Requires-Dist: python-dotenv (>=0.19.2,<0.20.0)
Requires-Dist: seaborn (>=0.11.2,<0.12.0)
Requires-Dist: sklearn (>=0.0,<0.1)
Requires-Dist: xgboost (>=1.5.0,<2.0.0)
Project-URL: Repository, https://github.com/otovo/bender
Description-Content-Type: text/markdown

# Bender 🤖

A Python package for faster, safer, and simpler ML processes.

## Why use `bender`?

Bender will make your machine learning processes, faster, safer, simpler while at the same time making it easy and flexible. This is done by providing a set base component, around the core processes that will take place in a ML pipeline process. While also helping you with type hints about what your next move could be.

## Pipeline Safety

The whole pipeline is build using generics from Python's typing system. Resulting in an improved developer experience, as the compiler can know if your pipeline's logic makes sense before it has started.

Bender will therefore make sure you **can't** make errors like

```python
# ⛔️ Invalid pipeline
DataImporters.sql(...)
    .process([...])
    # Compile Error: method `predict()` is not available
    .predict()

# ✅ Valid pipeline
DataImporters.sql(...)
    .process([...])
    .load_model(ModelLoader.aws_s3(...))
    .predict()
```

## Training Example
Below is a simple example for training a XGBoosted tree
```python
DataImporters
    # Fetch SQL data
    .sql(sql_url, sql_query)

    # Preproces the data
    .process([
        # Extract advanced information from json data
        Transformations.unpack_json("purchases", key="price", output_feature="price", policy=UnpackPolicy.median_number())

        Transformations.log_normal_shift("y_values", "y_log"),

        # Get date values from a date feature
        Transformations.date_component("month", "date", output_feature="month_value"),
    ])

    # Split 70 / 30% for train and test set
    .split(SplitStrategies.ratio(0.7))

    # Train a XGBoosted Tree model
    .train(
        ModelTrainer.xgboost(),
        input_features=['y_log', 'price', 'month_value', 'country', ...],
        target_feature='did_buy_product_x'
    )

    # Evaluate how good the model is based on the test set
    .evaluate([
        Evaluators.roc_curve(),
        Evaluators.confusion_matrix(),
        Evaluators.precision_recall(
            # Overwrite where to export the evaluated result
            Exporter.disk("precision-recall.png")
        ),
    ])
```

## Predicting Example

Below will a model be loaded from a AWS S3 bucket, preprocess the data, and predict the output.
This will also make sure that the features are valid before predicting.

```python
ModelLoaders
    # Fetch Model
    .aws_s3("path/to/model", s3_config)

    # Load data
    .import_data(
        DataImporters.sql(sql_url, sql_query)
            # Caching import localy for 1 day
            .cached("cache/path")
    )
    # Preproces the data
    .process([
        Transformations.unpack_json(...),
        ...
    ])
    # Predict the values
    .predict()
```

