
# ml-pipeline-analyzer
[![Build Status](https://app.travis-ci.com/TharunKumarReddy5/ml-pipeline-analyzer.svg?branch=main)](https://app.travis-ci.com/TharunKumarReddy5/ml-pipeline-analyzer)
[![Coverage Status](https://coveralls.io/repos/github/TharunKumarReddy5/ml-pipeline-analyzer/badge.svg?branch=main&service=github&kill_cache=1)](https://coveralls.io/github/TharunKumarReddy5/ml-pipeline-analyzer?branch=main&service=github&kill_cache=1)
[![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)](https://github.com/tterb/atomic-design-ui/blob/master/LICENSEs)
![contributors](https://img.shields.io/github/contributors/TharunKumarReddy5/ml-pipeline-analyzer.svg)
![codesize](https://img.shields.io/github/languages/code-size/TharunKumarReddy5/ml-pipeline-analyzer.svg) 
![pullrequests](https://img.shields.io/github/issues-pr/TharunKumarReddy5/ml-pipeline-analyzer.svg) 
![closedpullrequests](https://img.shields.io/github/issues-pr-closed-raw/TharunKumarReddy5/ml-pipeline-analyzer.svg)


# Machine Learning Pipeline Analyzer (MLPA)

Machine Learning Pipeline Analyzer (MLPA) is a python package that at its core analyzes, suggests and visualizes machine learning pipelines.

One of the primary goals of this package is to provide the user with a self-intuitive visual diagram of the pipeline model that explains the various components of the model and its respective attributes while also suggesting the changes and best pipeline model for the user needs.

## Motivation

As a machine learning engineer or a Data Science engineer, we often create ML pipelines that perform multiple tasks like:  
Data extraction -> Data Cleaning -> Data Manipulation -> Feature Selection/Reduction -> Model train and predict -> Cross Validation -> Model load/save

However, as the various components of a pipeline increase, creating a manual flowchart is not feasible but rather hard to understand/track. And although, there are certain already existing python packages leveraging DAG to visualize these ML pipelines, yet they can be hard to explore and understand.
Therefore, our goal was to create a package that automates the daunting process of visualizing ML pipelines while also providing a capability to suggest the changes or best pipeline modes for the user inputted dataframes. 

## Acknowledgements

MLPA is an easier and simpler wrapper using the capabilities from the following existing Python libraries:

 - [Scikit-Learn](https://scikit-learn.org/stable/)
 - [EvalML](https://evalml.alteryx.com/en/stable/)
 - [Graphviz](https://graphviz.org/)
 
## Installing the package

    pip install mlpipeline_analyzer

## Dependencies

Install the dependencies from the requirements.txt file using

    python -m pip install -r requirements.txt

## Code Examples:

**Code example_1**: Here in this part, the user uploads a model .pkl file which is then passed as an input to the PipelineDiagram class. The two ML pipeline diagrams are created using .show and .show_params:

    evalml_pipeline = joblib.load('models/automl_pipeline.pkl')
    a = PipelineDiagram(evalml_pipeline)
    a.show(title='Evalml ML Pipeline Diagram')
    a.show_params(title='Evalml Machine Learning Parameters Pipeline')

**Code example_2**: Here in this part, the suggest function generates the output for the varoius components of the model depending upon what the user specifies:

    b = PipelineSuggest()
    b.fit(data = df, response = 'survived', predictor_list = ['pclass','age','gender'], problem_type='binary', objective='auto', test_size=0.2)
    b.suggest(suggest_type='fe')
    b.suggest(suggest_type='model')
    b.suggest(suggest_type='all')

## Screenshots

Examples of outputs generated by the functions

- Screenshot of a ML pipeline summary diagram:

[Image1](https://github.com/TharunKumarReddy5/ml-pipeline-analyzer/blob/main/examples/machine_learning_pipeline.png)
![Alt text](https://github.com/TharunKumarReddy5/ml-pipeline-analyzer/blob/main/examples/machine_learning_pipeline.png "ML Pipeline Summary Diagram")

- Screenshot of a ML pipeline hyperparameter diagram:

[Image2](https://github.com/TharunKumarReddy5/ml-pipeline-analyzer/blob/main/examples/ml_pipeline_params.PNG)
![Alt text](https://github.com/TharunKumarReddy5/ml-pipeline-analyzer/blob/main/examples/ml_pipeline_params.PNG "ML Pipeline Hyperparameter Diagram")
    
## Build Status

Although currently MLPA already supports feature engineering to some extent, however, with expandability as one of the project goals, we plan to add more and specific capabilities catered to the following areas:
- Feature Engineering
- Feature Extraction/Selection
- Feature Reduction

For project extension, one possible functionality could be the capability for the user to specify the engine that they want to use for their model (example: TPOT, EvalML) and run the MLPA package on top of that engine.

## Code Style

Languages used: Python
Coding Style:
    - PEP 8
    - Docstrings

Following sofware design principles have been considered while packaging MLPA:

- Modular design
    - *'Somewhat General Purpose'* module
    - Deep Modules
    - Separation of Concerns
- Reusability/Extensibility
- Intuitable
- Version Control using Github
- Exception Handling
- Support for automated CI/CD using Travis
- Unit testing and coverage for quality assurance

## Authors
- [Aniket Fadia](https://github.com/aniketfadia96)
- [Jasmine Bhalla](https://github.com/JasmineBhalla17)
- [Sravan Hande](https://github.com/sravankr96)
- [Tharun Kumar Reddy Karasani](https://github.com/TharunKumarReddy5)

![GitHub Contributors Image](https://contrib.rocks/image?repo=TharunKumarReddy5/ml-pipeline-analyzer)

## Contribute

This project is an open-source project- open to the Python user community for contribution.
