Metadata-Version: 2.1
Name: pytorch-widedeep
Version: 1.1.1
Summary: Combine tabular data with text and images using Wide and Deep models in Pytorch
Home-page: https://github.com/jrzaurin/pytorch-widedeep
Author: Javier Rodriguez Zaurin
Author-email: jrzaurin@gmail.com
License: MIT
Description: [![PyPI version](https://badge.fury.io/py/pytorch-widedeep.svg)](https://pypi.org/project/pytorch-widedeep/)
        [![Python 3.7 3.8 3.9](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue.svg)](https://pypi.org/project/pytorch-widedeep/)
        [![Build Status](https://github.com/jrzaurin/pytorch-widedeep/actions/workflows/build.yml/badge.svg)](https://github.com/jrzaurin/pytorch-widedeep/actions)
        [![Documentation Status](https://readthedocs.org/projects/pytorch-widedeep/badge/?version=latest)](https://pytorch-widedeep.readthedocs.io/en/latest/?badge=latest)
        [![codecov](https://codecov.io/gh/jrzaurin/pytorch-widedeep/branch/master/graph/badge.svg)](https://codecov.io/gh/jrzaurin/pytorch-widedeep)
        [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
        [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/jrzaurin/pytorch-widedeep/graphs/commit-activity)
        [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/jrzaurin/pytorch-widedeep/issues)
        [![Slack](https://img.shields.io/badge/slack-chat-green.svg?logo=slack)](https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw)
        
        
        # pytorch-widedeep
        
        A flexible package for multimodal-deep-learning to combine tabular data with
        text and images using Wide and Deep models in Pytorch
        
        **Documentation:** [https://pytorch-widedeep.readthedocs.io](https://pytorch-widedeep.readthedocs.io/en/latest/index.html)
        
        **Companion posts and tutorials:** [infinitoml](https://jrzaurin.github.io/infinitoml/)
        
        **Experiments and comparisson with `LightGBM`**: [TabularDL vs LightGBM](https://github.com/jrzaurin/tabulardl-benchmark)
        
        **Slack**: if you want to contribute or just want to chat with us, join [slack](https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw)
        
        ### Introduction
        
        ``pytorch-widedeep`` is based on Google's [Wide and Deep Algorithm](https://arxiv.org/abs/1606.07792),
        adjusted for multi-modal datasets
        
        In general terms, `pytorch-widedeep` is a package to use deep learning with
        tabular data. In particular, is intended to facilitate the combination of text
        and images with corresponding tabular data using wide and deep models. With
        that in mind there are a number of architectures that can be implemented with
        just a few lines of code. For details on the main components of those
        architectures please visit the
        [repo](https://github.com/jrzaurin/pytorch-widedeep).
        
        
        ###  Installation
        
        Install using pip:
        
        ```bash
        pip install pytorch-widedeep
        ```
        
        Or install directly from github
        
        ```bash
        pip install git+https://github.com/jrzaurin/pytorch-widedeep.git
        ```
        
        #### Developer Install
        
        ```bash
        # Clone the repository
        git clone https://github.com/jrzaurin/pytorch-widedeep
        cd pytorch-widedeep
        
        # Install in dev mode
        pip install -e .
        ```
        
        ### Quick start
        
        Binary classification with the [adult
        dataset]([adult](https://www.kaggle.com/wenruliu/adult-income-dataset))
        using `Wide` and `DeepDense` and defaults settings.
        
        Building a wide (linear) and deep model with ``pytorch-widedeep``:
        
        ```python
        import pandas as pd
        import numpy as np
        import torch
        from sklearn.model_selection import train_test_split
        
        from pytorch_widedeep import Trainer
        from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor
        from pytorch_widedeep.models import Wide, TabMlp, WideDeep
        from pytorch_widedeep.metrics import Accuracy
        from pytorch_widedeep.datasets import load_adult
        
        
        df = load_adult(as_frame=True)
        df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
        df.drop("income", axis=1, inplace=True)
        df_train, df_test = train_test_split(df, test_size=0.2, stratify=df.income_label)
        
        # Define the 'column set up'
        wide_cols = [
            "education",
            "relationship",
            "workclass",
            "occupation",
            "native-country",
            "gender",
        ]
        crossed_cols = [("education", "occupation"), ("native-country", "occupation")]
        
        cat_embed_cols = [
            "workclass",
            "education",
            "marital-status",
            "occupation",
            "relationship",
            "race",
            "gender",
            "capital-gain",
            "capital-loss",
            "native-country",
        ]
        continuous_cols = ["age", "hours-per-week"]
        target = "income_label"
        target = df_train[target].values
        
        # prepare the data
        wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
        X_wide = wide_preprocessor.fit_transform(df_train)
        
        tab_preprocessor = TabPreprocessor(
            cat_embed_cols=cat_embed_cols, continuous_cols=continuous_cols  # type: ignore[arg-type]
        )
        X_tab = tab_preprocessor.fit_transform(df_train)
        
        # build the model
        wide = Wide(input_dim=np.unique(X_wide).shape[0], pred_dim=1)
        tab_mlp = TabMlp(
            column_idx=tab_preprocessor.column_idx,
            cat_embed_input=tab_preprocessor.cat_embed_input,
            continuous_cols=continuous_cols,
        )
        model = WideDeep(wide=wide, deeptabular=tab_mlp)
        
        # train and validate
        trainer = Trainer(model, objective="binary", metrics=[Accuracy])
        trainer.fit(
            X_wide=X_wide,
            X_tab=X_tab,
            target=target,
            n_epochs=5,
            batch_size=256,
        )
        
        # predict on test
        X_wide_te = wide_preprocessor.transform(df_test)
        X_tab_te = tab_preprocessor.transform(df_test)
        preds = trainer.predict(X_wide=X_wide_te, X_tab=X_tab_te)
        
        # Save and load
        
        # Option 1: this will also save training history and lr history if the
        # LRHistory callback is used
        trainer.save(path="model_weights", save_state_dict=True)
        
        # Option 2: save as any other torch model
        torch.save(model.state_dict(), "model_weights/wd_model.pt")
        
        # From here in advance, Option 1 or 2 are the same. I assume the user has
        # prepared the data and defined the new model components:
        # 1. Build the model
        model_new = WideDeep(wide=wide, deeptabular=tab_mlp)
        model_new.load_state_dict(torch.load("model_weights/wd_model.pt"))
        
        # 2. Instantiate the trainer
        trainer_new = Trainer(model_new, objective="binary")
        
        # 3. Either start the fit or directly predict
        preds = trainer_new.predict(X_wide=X_wide, X_tab=X_tab)
        ```
        
        Of course, one can do **much more**. See the Examples folder, the
        documentation or the companion posts for a better understanding of the content
        of the package and its functionalities.
        
        ### Testing
        
        ```
        pytest tests
        ```
        
        ### Acknowledgments
        
        This library takes from a series of other libraries, so I think it is just
        fair to mention them here in the README (specific mentions are also included
        in the code).
        
        The `Callbacks` and `Initializers` structure and code is inspired by the
        [`torchsample`](https://github.com/ncullen93/torchsample) library, which in
        itself partially inspired by [`Keras`](https://keras.io/).
        
        The `TextProcessor` class in this library uses the
        [`fastai`](https://docs.fast.ai/text.transform.html#BaseTokenizer.tokenizer)'s
        `Tokenizer` and `Vocab`. The code at `utils.fastai_transforms` is a minor
        adaptation of their code so it functions within this library. To my experience
        their `Tokenizer` is the best in class.
        
        The `ImageProcessor` class in this library uses code from the fantastic [Deep
        Learning for Computer
        Vision](https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/)
        (DL4CV) book by Adrian Rosebrock.
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Other Environment
Classifier: Framework :: Jupyter
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7.0
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: docs
Provides-Extra: quality
Provides-Extra: all
