Metadata-Version: 2.1
Name: pspso
Version: 0.0.8
Summary: pspso is a python package for selecting machine learning algorithms parameters.
Home-page: https://github.com/ayhaidar/pspso
Author: Ali Haidar
Author-email: ali.hdrv@outlook.com
License: MIT
Description: ![image](LOGO.png)
        
        Welcome to pspso's documentation!
        =================================
        
        Overview and Installation
        =========================
        
        Overview
        --------
        
        **pspso** is a python library for selecting machine learning algorithms
        parameters. The first version supports two single algorithms:
        Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM). It
        supports two ensembles: Extreme Gradient Boosting (XGBoost) and Gradient
        Boosting Decision Trees (GBDT).
        
        Two types of machine learning tasks are supported by pspso:
        
        -   Regression.
        -   Binary classification.
        
        Three scores are supported in the first version of pspso:
        
        -   **Regression** :
        
            > -   Root Mean Square Error (RMSE)
        
        -   **Binary Classication** :
        
            > -   Area under the Curve (AUC) of the Receiver Operating
            >     Characteristic (ROC)
            > -   Accuracy
        
        Installation
        ------------
        
        Use the package manager [pip](https://pip.pypa.io/en/stable/) to install
        pspso.
        
        ``` {.sourceCode .bash}
        pip install pspso
        ```
        
        Usage
        =====
        
        MLP Example (Binary Classification)
        -----------------------------------
        
        **pspso** is used to select the machine learning algorithms parameters.
        Below is an example for using the pspso to select the parameters of the
        MLP. pspso handles the MLP random weights intialization issue that may
        cause losing the best solution in consecutive iterations.
        
        The following example demonstrates the selection process of the MLP
        parameters. A variable named *params* was not given by the user. Hence,
        the default search space of the MLP is loaded. This search space
        contains five parameters:
        
        ``` {.sourceCode .python}
        params = {"optimizer": ["RMSprop", "adam", "sgd",'adamax','nadam','adadelta'] ,
              "learning_rate":  [0.1,0.3,2],
              "neurons": [1,40,0],
              "hiddenactivation": ['relu','sigmoid','tanh'],
              "activation":['relu','sigmoid','tanh']} 
        ```
        
        The task and the score were defined as *binary classification* and *auc*
        respectively. Then, the PSO was used to select the parameters of the
        MLP. Results are provided back to the user through the
        **print\_results()** function.
        
        ``` {.sourceCode .python}
        from sklearn.preprocessing import MinMaxScaler
        from pspso import pspso
        from sklearn import datasets
        from sklearn.model_selection import train_test_split
        
        breastcancer = datasets.load_breast_cancer()
        data=breastcancer.data#get the breast cancer dataset input features
        target=breastcancer.target# target
        X_train, X_test, Y_train, Y_test = train_test_split(data, target,test_size=0.1,random_state=42,stratify=target)
        normalize = MinMaxScaler(feature_range=(0,1))#normalize input features 
        X_train=normalize.fit_transform(X_train)
        X_test=normalize.transform(X_test)
        X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train,test_size=0.15,random_state=42,stratify=Y_train)
        p=pspso(estimator='mlp',task='binary classification', score='auc')
        pos,cost,duration,model,optimizer=p.fitpspso(X_train,Y_train,X_val,Y_val)
        p.print_results()#print the results
        testscore=pspso.predict(p.model,p.estimator,p.task,p.score, X_test, Y_test)
        print(1-testscore)
        ```
        
        In this example, four parameters were examined: optimizer,
        learning\_rate, hiddenactivation, and activation. The number of neurons
        in the hidden layer was kept as default.
        
        Output:
        
        ``` {.sourceCode .python}
        Estimator: mlp
        Task: binary classification
        Selection type: PSO
        Number of attempts:50
        Total number of combinations: 45360
        Parameters:
        {'optimizer': 'nadam', 'learning_rate': 0.29, 'neurons': 4, 'hiddenactivation': 'sigmoid', 'activation': 'sigmoid'}
        Global best position: [3.8997699  0.28725911 4.21218138 1.41200923 0.84643591]
        Global best cost: 0.0
        Time taken to find the set of parameters: 160.3374378681183
        Number of particles: 5
        Number of iterations: 10
        0.9867724867724867
        ```
        
        XGBoost Example (Binary Classification)
        ---------------------------------------
        
        ``` {.sourceCode .python}
        from sklearn.preprocessing import MinMaxScaler
        from pspso import pspso
        from sklearn import datasets
        from sklearn.model_selection import train_test_split
        
        breastcancer = datasets.load_breast_cancer()
        data=breastcancer.data#get the breast cancer dataset input features
        target=breastcancer.target# target
        X_train, X_test, Y_train, Y_test = train_test_split(data, target,test_size=0.1,random_state=42,stratify=target)
        normalize = MinMaxScaler(feature_range=(0,1))#normalize input features 
        X_train=normalize.fit_transform(X_train)
        X_test=normalize.transform(X_test)
        X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train,test_size=0.15,random_state=42,stratify=Y_train)
        
        params = {
                "learning_rate":  [0.01,0.2,2],
                "max_depth": [1,10,0],
                "n_estimators": [2,200,0],
                "subsample": [0.7,1,1]}
        p=pspso(estimator='xgboost',params=params,task='binary classification', score='auc')
        pos,cost,duration,model,optimizer=p.fitpspso(X_train,Y_train,X_val,Y_val)
        p.print_results()#print the results
        testscore=pspso.predict(p.model,p.estimator,p.task,p.score, X_test, Y_test)
        print(1-testscore)
        ```
        
        XGBoost Example (Regression)
        ----------------------------
        
        The XGBoost is an implementation of boosting decision trees. Five
        parameters were utilized for selection: objective, learning rate,
        maximum depth, number of estimators, and subsample. Three categorical
        values were selected for the objective parameter. The learning rate
        parameter values range between *0.01* and *0.2* with *2* decimal point,
        maximum depth ranges between *1* and *10* with *0* decimal points
        *(1,2,3,4,5,6,7,8,9,10)*, etc. The task and score are selected as
        regression and RMSE respectively. The number of particles and number of
        iterations can be left as default values if needed. Then, a pspso
        instance is created. By applying the fitpspso function, the selection
        process is applied. Finally, results are printed back to the user. The
        best model, best parameters, score, time, and other details will be
        saved in the created instance for the user to check.
        
        ``` {.sourceCode .python}
        from sklearn.preprocessing import MinMaxScaler
        from pspso import pspso
        from sklearn import datasets
        from sklearn.model_selection import train_test_split
        
        boston_data = datasets.load_boston()
        data=boston_data.data
        target=boston_data.target
        
        X_train, X_test, Y_train, Y_test = train_test_split(data, target,test_size=0.1,random_state=42)
        normalize = MinMaxScaler(feature_range=(0,1))#normalize input features
        normalizetarget = MinMaxScaler(feature_range=(0,1))#normalize target
        
        X_train=normalize.fit_transform(X_train)
        X_test=normalize.transform(X_test)
        Y_train=normalizetarget.fit_transform(Y_train.reshape(-1,1))
        Y_test=normalizetarget.transform(Y_test.reshape(-1,1))
        
        X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train,test_size=0.25,random_state=42)
        params = {
                "objective":['reg:tweedie',"reg:linear","reg:gamma"],
                "learning_rate":  [0.01,0.2,2],
                "max_depth": [1,10,0],
                "n_estimators": [2,200,0],
                "subsample": [0.7,1,1]}
        p=pspso(estimator='xgboost',params=params,task='regression', score='rmse')
        pos,cost,duration,model,optimizer=p.fitpspso(X_train,Y_train,X_val,Y_val)
        p.print_results()#print the results
        testscore=pspso.predict(p.model,p.estimator,p.task,p.score, X_test, Y_test)
        print(testscore)
        ```
        
        User Input
        ----------
        
        The user enters the type of the algorithm
        ('mlp','svm','xgboost','gbdt'); the task type ('binary classification',
        'regression'), score ('rmse','acc', or 'auc'). The user can keep the
        parameters variable empty, where a default set of parameters and ranges
        is loaded for each algorithm.
        
        ``` {.sourceCode .python}
        from pspso import pspso
        task='binary classification'
        score='auc'
        p=pspso.pspso('xgboost',None,task,score)
        ```
        
        Pspso allows the user to provide a range of parameters for exploration.
        The parameters vary between each algorithm. For this current version, up
        to 5 paramaters can be explored at the same time.
        
        The parameters are encoded in json object that consists of *key,value*
        pairs:
        
        ``` {.sourceCode .python}
        params = {
            "objective":['reg:tweedie',"reg:linear","reg:gamma"],
            "learning_rate":  [0.01,0.2,2],
            "max_depth": [1,10,0],
            "n_estimators": [2,200,0],
            "subsample": [0.7,1,1]}
        ```
        
        The key can be any parameter belonging to to the algorithm under
        investigation. The value is a list. Pspso will check the type of the
        first element in the list, which will determine if the values of the
        parameter are categorical or numerical.
        
        **Categorical Parameters**
        
        If the parameter values are *categorical*, string values are expected to
        be found in the list, as shown in *objective* parameter. The values in
        the list will be automatically mapped into a list of integers, where
        each integer represents a value in the original list. The order of the
        values inside the list affect the position of the value in the search
        space.
        
        **Numerical Parameters**
        
        On the other side, if the parameter is numerical, a list with three
        elements is expected [lb,ub, rv]:
        
        -   **lb**: repesents the lowest value in the search space
        -   **ub**: represents the maximum value in the search space
        -   **rv**: represents the number of decimal points the parameter values
            are rounded to before being added for training the algorithm
        
        For e.g if you want pspso to select n\_estimators, you add the following
        list *[2,200,0]* as in the example. By that, the lowest n\_estimators
        will be *2*, the highest to be examined is *200*, and each possible
        value is rounded to an integer value ( *0* decimal points).
        
        **Other parameters**
        
        The user is given the chance to handle some of the default parameters
        such as the number of epochs in the MLP. The user can modify this by
        changing a pspso class instance. For e.g., to change the number of
        epochs from 50 to 10 for an MLP training:
        
        ``` {.sourceCode .python}
        from pspso import pspso
        task='binary classification'
        score='auc'
        p=pspso.pspso('mlp',None,task,score)# in case of empty set of params (None) default search space is loaded
        p.defaultparams['epochs']=10
        ```
        
        The verbosity can be modified for any algorithm, which allows showing
        details of the training process:
        
        ``` {.sourceCode .python}
        from pspso import pspso
        task='binary classification'
        score='auc'
        p=pspso.pspso('mlp',None,task,score)
        p.verbosity=1
        ```
        
        Early stopping rounds for supporting algorithm can be modified, default
        is 60:
        
        ``` {.sourceCode .python}
        from pspso import pspso
        task='binary classification'
        score='auc'
        p=pspso.pspso('xgboost',None,task,score)
        p.early_stopping=10
        ```
        
        
        Future Work
        ===========
        
        Additional Parameters
        ---------------------
        
        To add new parameters to the currently supported algorithms, two
        functions should be updated
        
        The **read\_params** function should include default details about the
        parameter, The **forward\_prop\_algorithmname** function should add the
        parameter to the initialization process
        
        New Algorithms
        --------------
        
        Adding a new algorithm is more complex as you will be required to add an
        objective function that will detail the training and evaluation process.
        
        New Optimizers Two main optimizers are currently supported. These
        algorithms are built in the pyswams function.
        
        The default is globalbest pso, however the user can specify the local
        pso The pso parameters are set to default in each case and can be
        modified by the user.
        
        Crossvalidation
        ---------------
        
        We are working towards adding the cross validation support that will
        take the training data and number of folds, then split the records and
        train each fold. Finally, the average performance is retuned to the
        user.
        
        Multi-Class Classification
        --------------------------
        
        We are also working on adding multi-class classification and data
        oversampling techniques.
        
        Contributing
        ============
        
        Pull requests are welcome. For major changes, please open an issue first
        to discuss what you would like to change.
        
        Please make sure to update tests as appropriate.
        
        We are working towards adding the cross validation support that will
        take the training data and number of folds, then split the records and
        train each fold. Finally, the average performance is retuned to the
        user.
        
        We are also working on adding multi-class classification and data
        oversampling techniques.
        
        License
        =======
        
        Copyright (c) [2020] [Ali Haidar]
        
        Permission is hereby granted, free of charge, to any person obtaining a
        copy of this software and associated documentation files (the
        "Software"), to deal in the Software without restriction, including
        without limitation the rights to use, copy, modify, merge, publish,
        distribute, sublicense, and/or sell copies of the Software, and to
        permit persons to whom the Software is furnished to do so, subject to
        the following conditions:
        
        The above copyright notice and this permission notice shall be included
        in all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
        OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
        IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
        CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
        TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
        SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Intended Audience :: Developers
Requires-Python: >=3.5
Description-Content-Type: text/markdown
