Metadata-Version: 2.1
Name: HighlanderML
Version: 1.0.1.1.4
Summary: Highlander project
Author: Pixpit (Giovanni Vignali)
Author-email: <giovanni.vignali@outlook.it>
Keywords: python,machine learning,Weather forecasting,Milk,Climat change
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
License-File: LICENSE


# Highlander Project

This repository contains all the information for the Machine learning data analyses used in the Highlander project

## Dependencies

The following dependencies are required to perform the analysis in Python:

```
pandas>=1.3.4 
numpy>=1.21.2
scikit-learn>=1.0
h2o>=3.38.0.1
```
The main functions are grouped into two files:

```
RFE_module.py
H2O_module.py
```

## Modules and functions

Two modules required to perform the analyses are located in the highlander_script/ directory. The RFE_module.py perform a Recursive Feature Elimination reducing the input variables. The user can define the number of features to maintain. The discarded features are the most uninformative and redundant in the dataset. Consequently, this step can reduce the variables, allowing a more precise and quick analysis using the Machine Learning algorithms. It can be considered a feature selection prior to analyze the data.
The H2O_module.py contains several functions to perform the Machine Learning analyses. In detail:

<ul>
<li> It searches for the best algorithm to perform the prediction; the best algorithm is chosen using a set of different models </li>
<li> Once the best model is selected, the best hyper-parameters are tuned using a grid search approach </li>
<li> The model is then trained and tested, and the accuracy is evaluated using the Mean Absolute Error metric </li>
<li> The relative importance of all the variables in the prediction is then evaluated </li> 
<li> The subset of the most important variables is identified. This step an be considered a feature selection after the data analysis </li>
<li> The SHAP algorithm is then used to evaluate the contribution and the explanation of each feature to the classification </li>
</ul>
  
## Tutorial

The whole pipeline to perform and end-to-end data analysis is reported in the file inseire_nome_qui.py
