Metadata-Version: 2.1
Name: AutoMLpy
Version: 0.0.31
Summary: This package is an automatic machine learning module whose function is to optimize the hyper-parameters of an automatic learning model. Code at: https://github.com/JeremieGince/AutoMLpy .
Home-page: https://github.com/JeremieGince/AutoMLpy
Author: Jérémie Gince
Author-email: gincejeremie@gmail.com
License: Apache 2.0
Description: <!--- # <p align="center"> AutoMLpy </p>) -->
        
        <p align="center"> <img width="900" height="400" src="https://github.com/JeremieGince/AutoMLpy/blob/main/images/logo_001.png?raw=true"> </p>
        
        ---------------------------------------------------------------------------
        
        This package is an automatic machine learning module whose function is to optimize the hyper-parameters 
        of an automatic learning model. 
        
        In this package you can find: a grid search method, a random search algorithm and a Gaussian process search method. 
        Everything is implemented to be compatible with the _Tensorflow_, _pyTorch_ and _sklearn_ libraries. 
        
        
        # Installation
        
        ## Latest stable version:
        ```
        pip install AutoMLpy
        ```
        
        ## Latest unstable version:
        0. Download the .whl file [here](https://github.com/JeremieGince/AutoMLpy/blob/main/dist/AutoMLpy-0.0.3-py3-none-any.whl);
        1. Copy the path of this file on your computer;
        2. pip install it with ``` pip install [path].whl ```
        
         
         ---------------------------------------------------------------------------
        # Example - MNIST optimization with Tensorflow & Keras
        
        Here you can see an example on how to optimize a model made with Tensorflow and Keras on the popular dataset MNIST.
        
        ## Imports
        
        We start by importing some useful stuff.
        
        
        ```python
        # Some useful packages
        from typing import Union, Tuple
        import time
        import numpy as np
        import pandas as pd
        import pprint
        
        # Tensorflow
        import tensorflow as tf
        import tensorflow_datasets as tfds
        
        # Importing the HPOptimizer and the RandomHpSearch from the AutoMLpy package.
        from AutoMLpy import HpOptimizer, RandomHpSearch
        
        ```
        
        ## Dataset
        
        Now we load the MNIST dataset in the tensorflow way.
        
        
        ```python
        def normalize_img(image, label):
            """Normalizes images: `uint8` -> `float32`."""
            return tf.cast(image, tf.float32) / 255., label
        
        def get_tf_mnist_dataset(**kwargs):
            # https://www.tensorflow.org/datasets/keras_example
            (ds_train, ds_test), ds_info = tfds.load(
                'mnist',
                split=['train', 'test'],
                shuffle_files=True,
                as_supervised=True,
                with_info=True,
            )
        
            # Build training pipeline
            ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
            ds_train = ds_train.cache()
            ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
            ds_train = ds_train.batch(128)
            ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
        
            # Build evaluation pipeline
            ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
            ds_test = ds_test.batch(128)
            ds_test = ds_test.cache()
            ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
        
            return ds_train, ds_test
        ```
        
        ## Keras Model
        
        Now we make a function that return a keras model given a set of hyper-parameters (hp).
        
        
        ```python
        def get_tf_mnist_model(**hp):
        
            if hp.get("use_conv", False):
                model = tf.keras.models.Sequential([
                    # Convolution layers
                    tf.keras.layers.Conv2D(10, 3, padding="same", input_shape=(28, 28, 1)),
                    tf.keras.layers.MaxPool2D((2, 2)),
                    tf.keras.layers.Conv2D(50, 3, padding="same"),
                    tf.keras.layers.MaxPool2D((2, 2)),
        
                    # Dense layers
                    tf.keras.layers.Flatten(),
                    tf.keras.layers.Dense(120, activation='relu'),
                    tf.keras.layers.Dense(84, activation='relu'),
                    tf.keras.layers.Dense(10)
                ])
            else:
                model = tf.keras.models.Sequential([
                    tf.keras.layers.Flatten(input_shape=(28, 28)),
                    tf.keras.layers.Dense(120, activation='relu'),
                    tf.keras.layers.Dense(84, activation='relu'),
                    tf.keras.layers.Dense(10)
                ])
        
            return model
        
        ```
        
        ## The Optimizer Model
        
        It's time to implement the optimizer model. You just have to implement the following methods: "build_model",
        "fit_dataset_model_" and "score_on_dataset". Those methods must respect their signature and output type. The objective 
        here is to make the building, the training and the score phase depend on some hyper-parameters. So the optimizer can 
        use those to find the best set of hp.
        
        
        ```python
        class KerasMNISTHpOptimizer(HpOptimizer):
            def build_model(self, **hp) -> tf.keras.Model:
                model = get_tf_mnist_model(**hp)
        
                model.compile(
                    optimizer=tf.keras.optimizers.SGD(
                        learning_rate=hp.get("learning_rate", 1e-3),
                        nesterov=hp.get("nesterov", True),
                        momentum=hp.get("momentum", 0.99),
                    ),
                    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
                )
                return model
        
            def fit_dataset_model_(
                    self,
                    model: tf.keras.Model,
                    dataset,
                    **hp
            ) -> tf.keras.Model:
                history = model.fit(
                    dataset,
                    epochs=hp.get("epochs", 1),
                    verbose=False,
                )
                return model
        
            def score_on_dataset(
                    self,
                    model: tf.keras.Model,
                    dataset,
                    **hp
            ) -> float:
                test_loss, test_acc = model.evaluate(dataset, verbose=0)
                return test_acc
        
        ```
        
        ## Execution & Optimization
        
        First thing after creating our classes is to load the dataset in memory.
        
        
        ```python
        mnist_train, mnist_test = get_tf_mnist_dataset()
        mnist_hp_optimizer = KerasMNISTHpOptimizer()
        ```
        
        After you will define your hyper-parameters space with a dictionary like this.
        
        
        ```python
        hp_space = dict(
            epochs=list(range(1, 16)),
            learning_rate=np.linspace(1e-4, 1e-1, 50),
            nesterov=[True, False],
            momentum=np.linspace(0.01, 0.99, 50),
            use_conv=[True, False],
        )
        ```
        
        It's time to define your hp search algorithm and give it your budget in time and iteration. Here we will test for 
        10 minutes and 100 iterations maximum.
        
        
        ```python
        param_gen = RandomHpSearch(hp_space, max_seconds=60*10, max_itr=100)
        ```
        
        Finally, you start the optimization by giving your parameter generator to the optimize method. Note that the 
        "stop_criterion" argument is to stop the optimization when the given score is reached. It's really useful to save some 
        time.
        
        
        ```python
        save_kwargs = dict(
            save_name=f"tf_mnist_hp_opt",
            title="Random search: MNIST",
        )
        
        param_gen = mnist_hp_optimizer.optimize_on_dataset(
            param_gen, mnist_train, save_kwargs=save_kwargs,
            stop_criterion=1.0,
        )
        ```
            
        
        ## Testing
        
        Now, you can test the optimized hyper-parameters by fitting again with the full train dataset. Yes with the full 
        dataset, because in the optimization phase a cross-validation is made which crop your train dataset by half. Plus, 
        it's time to test the fitted model on the test dataset.
        
        
        ```python
        opt_hp = param_gen.get_best_param()
        
        model = mnist_hp_optimizer.build_model(**opt_hp)
        mnist_hp_optimizer.fit_dataset_model_(
            model, mnist_train, **opt_hp
        )
        
        test_acc = mnist_hp_optimizer.score_on_dataset(
            model, mnist_test, **opt_hp
        )
        
        print(f"test_acc: {test_acc*100:.3f}%")
        ```
            
        
        The optimized hyper-parameters:
        
        
        ```python
        pp = pprint.PrettyPrinter(indent=4)
        pp.pprint(opt_hp)
        ```
            
        
        ## Visualization
        
        You can visualize the optimization with an interactive html file.
        
        
        ```python
        fig = param_gen.write_optimization_to_html(show=True, dark_mode=True, **save_kwargs)
        ```
        
        ## Optimisation table
        ```python
        opt_table = param_gen.get_optimization_table()
        ```
        
        ## Saving ParameterGenerator
        ```python
        param_gen.save_history(**save_kwargs)
        save_path = param_gen.save_obj(**save_kwargs)
        ```
        
        ## Loading ParameterGenerator
        ```python
        param_gen = RandomHpSearch.load_obj(save_path)
        ```
        
        ## Re-lunch optimisation with loaded ParameterGenerator
        ```python
        # Change the budget to be able to optimize again
        param_gen.max_itr = param_gen.max_seconds + 100
        param_gen.max_seconds = param_gen.max_seconds + 60
        
        param_gen = mnist_hp_optimizer.optimize_on_dataset(
            param_gen, mnist_train, save_kwargs=save_kwargs,
            stop_criterion=1.0, reset_gen=False,
        )
        
        opt_hp = param_gen.get_best_param()
        
        print(param_gen.get_optimization_table())
        pp.pprint(param_gen.history)
        pp.pprint(opt_hp)
        ```
         
         ---------------------------------------------------------------------------
         # Other examples
         Examples on how to use this package are in the folder [./examples](https://github.com/JeremieGince/AutoMLpy/blob/main/examples). 
         There you can find the previous example with [_Tensorflow_](https://github.com/JeremieGince/AutoMLpy/blob/main/examples/tensorflow_example.ipynb) 
         and an example with [_pyTorch_](https://github.com/JeremieGince/AutoMLpy/blob/main/examples/pytorch_example.ipynb).
         
        
        
        # License
        [Apache License 2.0](LICENSE.md)
        
        # Citation
        ```
        @article{Gince,
          title={Implémentation du module AutoMLpy, un outil d’apprentissage machine automatique},
          author={Jérémie Gince},
          year={2021},
          publisher={ULaval},
          url={https://github.com/JeremieGince/AutoMLpy},
        }
        ```
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
