Surrogate Models¶
-
class
edbo.models.
GP_Model
(X, y, training_iters=100, inference_type='MLE', learning_rate=0.1, noise_constraint=1e-05, gpu=False, nu=2.5, lengthscale_prior=None, outputscale_prior=None, noise_prior=None, n_restarts=0)¶ Main gaussian process model used for Bayesian optimization.
Provides a framework for specifiying exact GP models, hyperparameters, and priors. This class also contains functions for training, sampling, forward prediction, and variance estimation.
Model implemented using GPyTorch: https://gpytorch.ai/
-
__init__
(X, y, training_iters=100, inference_type='MLE', learning_rate=0.1, noise_constraint=1e-05, gpu=False, nu=2.5, lengthscale_prior=None, outputscale_prior=None, noise_prior=None, n_restarts=0)¶ - Parameters
X (torch.tensor) – Training domain values.
y (torch.tensor) – Training response values.
training_iters (int) – Number of iterations to run ADAM optimizer durring training.
inference_type (str) – Estimation procedue to be used. Currently only MLE is availible.
learning_rate (float) – Learning rate for ADMA optimizer durring training.
noise_constraint (float) – Noise is constrained to be positive. Set’s the minimum noise level.
gpu (bool) – Use GPUs (if available) to run gaussian process computations.
nu (float) – Matern kernel parameter. Options: 0.5, 1.5, 2.5.
lengthscale_prior ([gpytorch.priors, init_value]) – GPyTorch prior object and initial value. Sets a prior over length scales.
outputscale_prior ([gpytorch.priors, init_value]) – GPyTorch prior object and initial value. Sets a prior over output scales.
noise_prior ([gpytorch.priors, init_value]) – GPyTorch prior object and initial value. Sets a prior over output scales.
n_restarts (int) – Number of random restarts for model training.
- Returns
- Return type
None.
-
mle
()¶ Uses maximum likelihood estimation to estimate model hyperparameters.
-
fit
()¶ Train the gaussian process model.
-
predict
(points)¶ Mean of gaussian process posterior predictive distribution.
- Parameters
points (torch.tensor) – Domain points to be evaluated.
- Returns
Predicted response values for points.
- Return type
numpy.array
-
variance
(points)¶ Variance of gaussian process posterior predictive distribution.
- Parameters
points (torch.tensor) – Domain points to be evaluated.
- Returns
Model variance a points.
- Return type
numpy.array
-
sample_posterior
(points, batch_size=1)¶ Sample functions from gaussian process posterior predictive distribution.
- Parameters
points (torch.tensor) – Domain points to be evaluated.
batch_size (int) – Number of samples to draw.
- Returns
Function values at points for samples.
- Return type
torch.tensor
-
regression
(return_data=False, export_path=None, return_scores=False)¶ Helper method for visualizing the models regression performance.
Generates a predicted vs observed plot using the models training data.
- Parameters
return_data (bool) – Return predicted responses.
export_path (None, str) – Export SVG image of predicted vs observed plot to export_path.
- Returns
Scatter plot with computed RMSE and R^2.
- Return type
matplotlib.pyplot
-
-
class
edbo.models.
RF_Model
(X, y, n_jobs=- 1, random_state=10, n_estimators=500, max_features='auto', max_depth=None, min_samples_leaf=1, min_samples_split=2, **kwargs)¶ Main random forest regression model used for Bayesian optimization.
Model implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
-
__init__
(X, y, n_jobs=- 1, random_state=10, n_estimators=500, max_features='auto', max_depth=None, min_samples_leaf=1, min_samples_split=2, **kwargs)¶ - Parameters
X (list, numpy.array, pandas.DataFrame) – Domain points to be used for model training.
y (list, numpy.array, pandas.DataFrame) – Response values to be used for model training.
n_jobs (int) – Number of processers to use.
random_state (int) – Insures identical data returns an identical ensemble of regression trees.
n_estimators (int) – Number of weak estimators to include in ensemble.
max_features ('auto', int) – Maximum number of features to consider per node in model training.
max_depth (None, int) – Maximum depth of individual trees.
min_samples_leaf (int) – Minimum number of samples required at each leaf node in model training.
min_samples_split (int) – Minimum number of samples to require for a node to split.
-
fit
()¶ Train the frandom forest model.
-
predict
(points)¶ Mean of the random forest ensemble predictions.
- Parameters
points (list, numpy.array, pandas.DataFrame) – Domain points to be evaluated.
- Returns
Predicted response values.
- Return type
numpy.array
-
regression
(return_data=False, export_path=None, return_scores=False)¶ Helper method for visualizing the models regression performance.
Generates a predicted vs observed plot using the models training data.
- Parameters
return_data (bool) – Return predicted responses.
export_path (None, str) – Export SVG image of predicted vs observed plot to export_path.
- Returns
Scatter plot with computed RMSE and R^2.
- Return type
matplotlib.pyplot
-
sample_posterior
(X, batch_size=1)¶ Sample weak estimators from the trained random forest model.
- Parameters
points (numpy.array) – Domain points to be evaluated.
batch_size (int) – Number of estimators predictions to draw from ensemble.
- Returns
Weak estimator predictions at points.
- Return type
torch.tensor
-
variance
(points)¶ Variance of random forest ensemble.
Model variance is estimated as the vairance in the individual tree predictions.
- Parameters
points (numpy.array) – Domain points to be evaluated.
- Returns
Ensemble variance at points.
- Return type
numpy.array
-
-
class
edbo.models.
Bayesian_Linear_Model
(X, y, **kwargs)¶ Bayesian linear regression object compatible with the BO framework.
Model implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ARDRegression.html#sklearn.linear_model.ARDRegression
-
__init__
(X, y, **kwargs)¶ - Parameters
X (list, numpy.array, pandas.DataFrame) – Domain points to be used for model training.
y (list, numpy.array, pandas.DataFrame) – Response values to be used for model training.
-
fit
()¶ Train the model using grid search CV.
-
get_scores
()¶ Get grid search cross validation results.
- Returns
Average scores and standard deviation of scores for grid.
- Return type
(numpy.array, numpy.array)
-
predict
(points)¶ Model predictions.
- Parameters
points (list, numpy.array, pandas.DataFrame) – Domain points to be evaluated.
- Returns
Predicted response values at points.
- Return type
numpy.array
-
regression
(return_data=False, export_path=None, return_scores=False)¶ Helper method for visualizing the models regression performance.
Generates a predicted vs observed plot using the models training data.
- Parameters
return_data (bool) – Return predicted responses.
export_path (None, str) – Export SVG image of predicted vs observed plot to export_path.
- Returns
Scatter plot with computed RMSE and R^2.
- Return type
matplotlib.pyplot
-
variance
(points)¶ Estimated variance of Bayesian linear model.
- Parameters
points (numpy.array) – Domain points to be evaluated.
- Returns
Model variance at points.
- Return type
numpy.array
-
-
class
edbo.models.
Random
(X, y, **kwargs)¶ Dummy class for random sampling.
Use with init_seed for benchmarking Bayesian optimization versus random sampling. Class defined such that it can be called by the BO class in simulations.
Note
Use Random with random acquisition function.
-
__init__
(X, y, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
edbo.models.
score
(trained_model, X, y)¶ Compute RMSE and R^2 for a trained model.
- Parameters
trainined_model (edbo.models) – Trained model.
X (numpy.array, torch.tensor) – Domain points to be evaluated.
y (numpy.array, torch.tensor) – Response values corresponding to X.
- Returns
RMSE and R^2 values.
- Return type
(int, int)
-
edbo.models.
cross_validate
(base_model, X, y, kfold=5, random_state=None, **kwargs)¶ Compute cross-validation scores for models.
- Parameters
base_model (edbo.models) – Uninitialized model object.
X (numpy.array, torch.tensor) – Domain points to be evaluated.
y (numpy.array, torch.tensor) – Response values corresponding to domain points X.
kfold (int) – Number of splits used in cross-validation.
- Returns
Mean training and validation scores [train_RMSE, validation_RMSE, train_R^2, validation_R^2].
- Return type
list