Metadata-Version: 2.1
Name: data-drift-detector
Version: 0.0.6
Summary: Compare differences between 2 datasets to identify data drift
Home-page: https://github.com/kelvnt/data-drift-detector
Author: Kelvin Tay
Author-email: btkelvin@gmail.com
License: GPLv3
Platform: UNKNOWN
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
License-File: LICENSE

# Data Drift Detector
[![PyPI version](https://badge.fury.io/py/data-drift-detector.svg)](https://badge.fury.io/py/data-drift-detector)

This package contains some developmental tools to detect and compare statistical differences between 2 structurally similar pandas dataframes. The intended purpose is to detect data drift - where the statistical properties of an input variable change over time.

We provide a class `DataDriftDetector` which takes in 2 pandas dataframes and provides a few useful methods to compare and analyze the differences between the 2 datasets.

## Installation
Install the package with pip

    pip install data-drift-detector

## Example Usage

To compare 2 datasets:

    from data_drift_detector import DataDriftDetector

    # initialize detector
    detector = DataDriftDetector(df_prior = df_1, df_post = df_2)

    # methods to compare and analyze differences
    detector.calculate_drift()
    detector.plot_numeric_to_numeric()
    detector.plot_categorical_to_numeric()
    detector.plot_categorical()
    detector.compare_ml_efficacy(target_column="some_target_column")

You may also view an example notebook in the following directory `examples/example_usage.ipynb` to explore how it may be used.


