Metadata-Version: 2.1
Name: SHFS
Version: 0.1.5
Summary:  Feature election group of classes calculate the importance of features based on the Shap library for the classification and regression problem Only works with randomforest models for efficiency or gradient boosting models. DFwrapper - remove multicollinearity and outliers
Home-page: https://github.com/ArtyKrafty/featureselectors
Author: Artem, et al.
Author-email: artysolomko@gmail.com
License: UNKNOWN
Keywords: shap,fi,pipeline
Platform: UNKNOWN
Classifier: Environment :: Console
Classifier: Natural Language :: English
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.6
Description-Content-Type: text/markdown

<p align="center"><img src="https://i.ibb.co/ZXSk6jG/machine-learning-1920x1180.jpg" alt="machine-learning-1920x1180"></p>
 
Library consist of two groups of Classes - Feature selectors and DFwrapper to have a deal with outliers and correlation

1. Feature selection group

The FeatureSelection calculates the importance of features based on the `Shap` library for a classification problem.
  Only works with trees for better efficiency or models based on
  gradient boosting. It is a priority to use such models as:
   
   Catboost - does not require handling of `NaN` and categories. works with `sklearn`

    NOTE: If your import is failing due to a missing package, you can
    manually install dependencies using either !pip or !apt.

            !pip install shap 
            !pip install phik
   
  https://pypi.org/project/SHFS/
  

            FeatureSelectionClf - for classification
            FeatureSelectionRegression - for regression
            FeatureSelectionUniversal - for both classification and regression tasks

  Quick start: [Collab](https://colab.research.google.com/drive/1eP6qZmxcTcsKgjLL7u_pHaM5sZc8346N?usp=sharing) and [Tutorial](https://nbviewer.org/github/ArtyKrafty/featureselectors/blob/main/Tutorial/Tutorials_ipynb_.ipynb)
        

  Parametrs. 
___
    `estimator` :   
        Supervised learning with the fit method will allow you to retrieve and select indices.
        the most important features.
    n_features_to_select: int, default = None.
        The number of features to select, the default is None.
    columns: List, default = None.
        The list of attributes of the initial set, the default is None.
    
  Methods
___
    fit - trains and identifies the most important features
    tranform - changes the original set and returns the selected attributes
    get_index - Returns the selected indexes attributes

    only for FeatureSelectionClf and FeatureSelectionRegression:

    plot_values - plotting shap values
    _estimator_type - @property method 
    get_feature_importance - Returns DataFrame FI
  Note
___
 Nan / Inf are allowed in case
    they are accepted by the fit method model
  Example use for classification
___
    cols = list(X_train.columns)
    cat_features = list(X_train_cat.select_dtypes(include=['object', 'category']).columns)
    num_features = list(X_train_cat.select_dtypes(exclude=['object', 'category']).columns)
    estimator = CatBoostClassifier(**params_cat)
    selector = FeatureSelectionClf(estimator, n_features_to_select=3, columns=cols) 
    preprocessor = ColumnTransformer (
        transformers = [

            ('std_scaler' , StandardScaler() , num_features) ,
            ('cat' , OrdinalEncoder() , cat_features),
            
            ]
    )
    
    pipe = Pipeline(steps=
        
        [ 
          ('preprocessor', preprocessor),
          ('selector', selector)

        ]
    )
       X_train_prep = pipe.fit_transform(X_train)
       
Example without Pipeline

       cols = list(X_train.columns)
       estimator = CatBoostClassifier(**params_cat)
       selector = FeatureSelectionClf(estimator, n_features_to_select=3, columns=cols)
       X = selector.fit(X_train_prep, y_train)


2. DFwrapper

DFwrapper - remove multicollinearity and outliers from Pandas DataFrame

    Usage example
    ----------
    1. Collinearity

    cleaner = DFwrapper()
    new_df = cleaner.wrap_corr(df)

    2. Outliers. Rough cleaning

    cleaner = DFwrapper(low=.05, high=.95)
    cleaned = cleaner.quantile_cleaner(df, cols_to_clean)

    2. Outliers. Finer cleaning

    cleaner = DFwrapper(koeff=1.5)
    cleaned = cleaner.frame_irq(df, cols_to_clean)



