The tuning of hyperparameters in a machine learning model takes place usually after processing the data and choosing the model, in fact the initialisation of the optimization takes place in order to update the parameters to achieve the highest accuracy score possible. Usually in all the existing tools, there is some kind of a pattern of the notation of the code adopted. In this article we are going to try to spot the logic behind these tools of optimization.
Basic optimization: using random search and grid search:
Grid search and random search: even if they adopt different approaches the pattern here is quite the same:
1. Choose the parameters for the grid or random search (list of hyperparameters)
# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
"max_features": [1, 3, 4],
"min_samples_split": [2, 3, 4],
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
2. Running the the GridSearchCV or RandomizedSearchCV instance on the parameters and fitting them
# run randomized search
n_iter_search = 5
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
n_iter=n_iter_search, cv=5)
random_search.fit(X_train, Y_train)
# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid, cv=5)
3. Printing the best performing parameters and applying them to the model (in the example below its)
#grid search
grid_search.fit(X_train, Y_train)
print(grid_search.best_params_)
print(grid_search.best_estimator_)
#random search
random_search.fit(X_train, Y_train)
print(random_search.best_params_)
print(random_search.best_estimator_)
Elaborate methods:
The more elaborate methods to optimize use more sophisticated approaches to optimize the hyperparameters including bayesian optimization, genetic algorithms, population based optimization and more. We are going to focus here on the bayesian optimization inspired tools namely the very popular hyperopt, BTB and scikit optimize, which adopt the SMBO technique.
Hyperopt: The pattern in this Tree Parzen estimator inspired tool here is to :
1. Import the methods and functions (for hyperopt tpe, hp and fmin)
from hyperopt import hp, tpe, fmin
# we import tpe algorithm
# fmin function which helps us minimize the equation
# hp which creates the search space
2. Define the objective function (mostly validation score or loss)
def objective(x):
return {'loss': x ** 2, 'status': STATUS_OK }
3. define the hyperparameters and their research space
space = hp.choice('classifier',[
{'model': KNeighborsClassifier,
'param': {'n_neighbors':
hp.choice('n_neighbors',range(3,11)),
'algorithm':hp.choice('algorithm',['ball_tree','kd_tree']),
'leaf_size':hp.choice('leaf_size',range(1,50)),
'metric':hp.choice('metric', ["euclidean","manhattan",
"chebyshev","minkowski"
])}
},
{'model': SVC,
'param':{'C':hp.lognormal('C',0,1),
'kernel':hp.choice('kernel',['rbf','poly','rbf','sigmoid']),
'degree':hp.choice('degree',range(1,15)),
'gamma':hp.uniform('gamma',0.001,10000)}
}
])
4. the database in which to store all the point evaluations of the search
5. the search algorithm to use (hyperopt uses two kinds of algorithms TPE and random search)
best = fmin(objective_func,space,
algo=tpe.suggest,max_evals=100)
print best
General rule:
In these types of optimizers the different types and formats of hyperparameters are taken into consideration, for example: “hp.choice” or “hp.uniform” or “hp.quniform” followed by the values to choose from a set of choices or to search in a range of values.
Scikit-Optimize: Using SMBO technique and following this pattern:
1. import the modules and functions
from skopt.space import Integer, Categorical, Real
from skopt.utils import use_named_args
from skopt import gp_minimize
2. Set up the hyperparameter space
space = [Integer(16, 256, name='num_leaves'),
Integer(8, 256, name='n_estimators'),
Categorical(['gbdt', 'dart', 'goss'], name='boosting_type'),
Real(0.001, 1.0, name='learning_rate')]
3. Defining the objective function (In our example it is the validation score)
def objective(**params):
regressor.set_params(**params)
return -np.mean(cross_val_score(regressor, X, Y, cv=5, n_jobs=1, scoring='neg_mean_absolute_error'))
4. Running the optimization process based on the algorithm we choose (Here its the gaussian process minimization “gp_minimize”)
reg_gp = gp_minimize(objective, space, verbose=True)
print('best score: {}'.format(reg_gp.fun))
BTB: bayesian tuning and bandits: this tool adopts a combination of bayesian optimization and multi arms bandits optimization, in other words, after a bayesian optimization is performed a series of iterations takes place in order for the tuner to use the information already obtained to propose the set of hyper parameters that it considers that have the highest probability to obtain the best results, the logic of the tool is as follows:
1. Letting the tuner propose new sets of hyper parameters
>>> parameters = tuner.propose()
>>> parameters
{'n_estimators': 297, 'max_depth': 3}
2. Fitting and scoring the model with the proposed hyper parameters
>>> model = RandomForestClassifier(**parameters)
>>> model.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=3, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=297, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
>>> score = model.score(X_test, y_test)
3. Passing the score obtained back to the tuner
tuner.add(parameters, score)
Sources:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html
https://github.com/hyperopt/hyperopt/wiki/FMin
https://github.com/HDI-Project/BTB
https://www.kaggle.com/schlerp/tuning-hyper-parameters-with-scikit-optimize
author :Ettaik Noureddine PhD Condidate At FSBM