nextstep.model subpackage¶
base_model¶
-
class
nextstep.model.base_model.base_model¶ Bases:
objectbase model class
-
evaluation(y_pre, y_true)¶ model evaluation method. Metrics include MAE, MSE and RMSE
- Parameters
y_pre (array-like, such as python list) – predicted values
y_true (array-like, such as python list) – true values
-
split(data, label_column, train_size, seed)¶ perform train test split.
- Parameters
data (pandas dataframe) – dataset
label_column (string) – label column name
train_size – training size as a ratio over entire data size
seed (int) – pseudorandom number generator initializing value, if you provide same seed value before generating random data it will produce the same data
-
split_noshuffle(data, label_column, train_size, seed)¶ perform train test split in the non-shuffle manner.
- Parameters
data (pandas dataframe) – dataset
label_column (string) – label column name
train_size – training size as a ratio over entire data size
seed (int) – pseudorandom number generator initializing value, if you provide same seed value before generating random data it will produce the same data
-
XGboost¶
adaboost¶
-
class
nextstep.model.adaboost.adaboost(config)¶ Bases:
nextstep.model.base_model.base_modeladaboost class
-
build_model(data)¶ building the adaboost model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
predict(X_new)¶ use fitted module for prediction.
- Parameters
X_new (array-like) – data of shape (n_samples, n_features)
-
example config
user_config = {
'label_column' : 'USEP', # label column name
'train_size' : 0.9, # train-test split
'seed' : 33,
'base_estimator': random_forest_model, # a fitted model
'n_estimators' : 10, # number of estimators
'learning_rate' : 1, # learning rate
'loss' : 'square' # loss function
}
arima¶
-
class
nextstep.model.arima.arima(config)¶ Bases:
nextstep.model.base_model.base_modelarima class.
-
autocorrelation(data, number_of_time_step=20)¶ plot autocorrelation.
- Parameters
data (pandas dataframe) – dataset
number_of_time_step (int, default to be 20) – number of time step needs to be considered for autocorrelation
Note
data length must be larger than specified number_of_time_step.
-
build_model(data)¶ building the arima model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
partial_autocorrelation(data, lags=20)¶ plot partial autocorrelation.
- Parameters
data (pandas dataframe) – dataset
lags (int, default to be 20) – number of lags needs to be considered for partial autocorrelation
Note
data length must be larger than specified lags.
-
predict_next_n(step)¶ use fitted module for prediction.
- Parameters
step – the number of values to be predicted
-
residual_density_plot()¶ plot residual density plot.
-
residual_plot()¶ plot residual.
-
lstm¶
-
class
nextstep.model.lstm.lstm_univariate(config)¶ Bases:
nextstep.model.base_model.base_modellong short-term memory class.
-
build_model(data)¶ building the lstm model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
predict(X_new)¶ use fitted module for prediction.
- Parameters
X_new (array-like) – data of shape (n_samples, n_features)
-
random_forest¶
-
class
nextstep.model.random_forest.random_forest(config)¶ Bases:
nextstep.model.base_model.base_modelrandom forest class.
-
build_model(data)¶ building the random forest model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
predict(X_new)¶ use fitted module for prediction.
- Parameters
X_new (array-like) – data of shape (n_samples, n_features)
-
sarima¶
-
class
nextstep.model.sarima.sarima(config)¶ Bases:
nextstep.model.base_model.base_modelsarima class.
-
autocorrelation(data, lags=20)¶ plot autocorrelation.
- Parameters
data (pandas dataframe) – dataset
lags (int, default to be 20) – number of lags needs to be considered for autocorrelation
Note
data length must be larger than specified lags.
-
build_model(data)¶ building the sarima model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
partial_autocorrelation(data, lags=20)¶ plot partial autocorrelation.
- Parameters
data (pandas dataframe) – dataset
lags (int, default to be 20) – number of lags needs to be considered for partial autocorrelation
Note
data length must be larger than specified lags.
-
predict(X_new)¶
-
predict_next_n(steps)¶
-
residual_density_plot()¶ plot residual density plot.
-
residual_plot()¶ plot residual.
-