Surrogate Models¶

class edbo.models.GP_Model(X, y, training_iters=100, inference_type='MLE', learning_rate=0.1, noise_constraint=1e-05, gpu=False, nu=2.5, lengthscale_prior=None, outputscale_prior=None, noise_prior=None, n_restarts=0)¶

Main gaussian process model used for Bayesian optimization.

Provides a framework for specifiying exact GP models, hyperparameters, and priors. This class also contains functions for training, sampling, forward prediction, and variance estimation.

Model implemented using GPyTorch: https://gpytorch.ai/

__init__(X, y, training_iters=100, inference_type='MLE', learning_rate=0.1, noise_constraint=1e-05, gpu=False, nu=2.5, lengthscale_prior=None, outputscale_prior=None, noise_prior=None, n_restarts=0)¶

Parameters

X (torch.tensor) – Training domain values.
y (torch.tensor) – Training response values.
training_iters (int) – Number of iterations to run ADAM optimizer durring training.
inference_type (str) – Estimation procedue to be used. Currently only MLE is availible.
learning_rate (float) – Learning rate for ADMA optimizer durring training.
noise_constraint (float) – Noise is constrained to be positive. Set’s the minimum noise level.
gpu (bool) – Use GPUs (if available) to run gaussian process computations.
nu (float) – Matern kernel parameter. Options: 0.5, 1.5, 2.5.
lengthscale_prior ([gpytorch.priors, init_value]) – GPyTorch prior object and initial value. Sets a prior over length scales.
outputscale_prior ([gpytorch.priors, init_value]) – GPyTorch prior object and initial value. Sets a prior over output scales.
noise_prior ([gpytorch.priors, init_value]) – GPyTorch prior object and initial value. Sets a prior over output scales.
n_restarts (int) – Number of random restarts for model training.

Returns

Return type

None.

mle()¶: Uses maximum likelihood estimation to estimate model hyperparameters.

fit()¶: Train the gaussian process model.

predict(points)¶

Mean of gaussian process posterior predictive distribution.

Parameters: points (torch.tensor) – Domain points to be evaluated.
Returns: Predicted response values for points.
Return type: numpy.array

variance(points)¶

Variance of gaussian process posterior predictive distribution.

Parameters: points (torch.tensor) – Domain points to be evaluated.
Returns: Model variance a points.
Return type: numpy.array

sample_posterior(points, batch_size=1)¶

Sample functions from gaussian process posterior predictive distribution.

Parameters

points (torch.tensor) – Domain points to be evaluated.
batch_size (int) – Number of samples to draw.

Returns

Function values at points for samples.

Return type

torch.tensor

regression(return_data=False, export_path=None, return_scores=False)¶

Helper method for visualizing the models regression performance.

Generates a predicted vs observed plot using the models training data.

Parameters

return_data (bool) – Return predicted responses.
export_path (None, str) – Export SVG image of predicted vs observed plot to export_path.

Returns

Scatter plot with computed RMSE and R^2.

Return type

matplotlib.pyplot

class edbo.models.RF_Model(X, y, n_jobs=- 1, random_state=10, n_estimators=500, max_features='auto', max_depth=None, min_samples_leaf=1, min_samples_split=2, **kwargs)¶

Main random forest regression model used for Bayesian optimization.

Model implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

__init__(X, y, n_jobs=- 1, random_state=10, n_estimators=500, max_features='auto', max_depth=None, min_samples_leaf=1, min_samples_split=2, **kwargs)¶

Parameters

X (list, numpy.array, pandas.DataFrame) – Domain points to be used for model training.
y (list, numpy.array, pandas.DataFrame) – Response values to be used for model training.
n_jobs (int) – Number of processers to use.
random_state (int) – Insures identical data returns an identical ensemble of regression trees.
n_estimators (int) – Number of weak estimators to include in ensemble.
max_features ('auto', int) – Maximum number of features to consider per node in model training.
max_depth (None, int) – Maximum depth of individual trees.
min_samples_leaf (int) – Minimum number of samples required at each leaf node in model training.
min_samples_split (int) – Minimum number of samples to require for a node to split.

fit()¶: Train the frandom forest model.

predict(points)¶

Mean of the random forest ensemble predictions.

Parameters: points (list, numpy.array, pandas.DataFrame) – Domain points to be evaluated.
Returns: Predicted response values.
Return type: numpy.array

regression(return_data=False, export_path=None, return_scores=False)¶

Helper method for visualizing the models regression performance.

Generates a predicted vs observed plot using the models training data.

Parameters

return_data (bool) – Return predicted responses.
export_path (None, str) – Export SVG image of predicted vs observed plot to export_path.

Returns

Scatter plot with computed RMSE and R^2.

Return type

matplotlib.pyplot

sample_posterior(X, batch_size=1)¶

Sample weak estimators from the trained random forest model.

Parameters

points (numpy.array) – Domain points to be evaluated.
batch_size (int) – Number of estimators predictions to draw from ensemble.

Returns

Weak estimator predictions at points.

Return type

torch.tensor

variance(points)¶

Variance of random forest ensemble.

Model variance is estimated as the vairance in the individual tree predictions.

Parameters: points (numpy.array) – Domain points to be evaluated.
Returns: Ensemble variance at points.
Return type: numpy.array

class edbo.models.Bayesian_Linear_Model(X, y, **kwargs)¶

Bayesian linear regression object compatible with the BO framework.

Model implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ARDRegression.html#sklearn.linear_model.ARDRegression

__init__(X, y, **kwargs)¶

Parameters

X (list, numpy.array, pandas.DataFrame) – Domain points to be used for model training.
y (list, numpy.array, pandas.DataFrame) – Response values to be used for model training.

fit()¶: Train the model using grid search CV.

get_scores()¶

Get grid search cross validation results.

Returns: Average scores and standard deviation of scores for grid.
Return type: (numpy.array, numpy.array)

predict(points)¶

Model predictions.

Parameters: points (list, numpy.array, pandas.DataFrame) – Domain points to be evaluated.
Returns: Predicted response values at points.
Return type: numpy.array

regression(return_data=False, export_path=None, return_scores=False)¶

Helper method for visualizing the models regression performance.

Generates a predicted vs observed plot using the models training data.

Parameters

return_data (bool) – Return predicted responses.
export_path (None, str) – Export SVG image of predicted vs observed plot to export_path.

Returns

Scatter plot with computed RMSE and R^2.

Return type

matplotlib.pyplot

variance(points)¶

Estimated variance of Bayesian linear model.

Parameters: points (numpy.array) – Domain points to be evaluated.
Returns: Model variance at points.
Return type: numpy.array

class edbo.models.Random(X, y, **kwargs)¶

Dummy class for random sampling.

Use with init_seed for benchmarking Bayesian optimization versus random sampling. Class defined such that it can be called by the BO class in simulations.

Note

Use Random with random acquisition function.

__init__(X, y, **kwargs)¶: Initialize self. See help(type(self)) for accurate signature.

edbo.models.score(trained_model, X, y)¶

Compute RMSE and R^2 for a trained model.

Parameters

trainined_model (edbo.models) – Trained model.
X (numpy.array, torch.tensor) – Domain points to be evaluated.
y (numpy.array, torch.tensor) – Response values corresponding to X.

Returns

RMSE and R^2 values.

Return type

(int, int)

edbo.models.cross_validate(base_model, X, y, kfold=5, random_state=None, **kwargs)¶

Compute cross-validation scores for models.

Parameters

base_model (edbo.models) – Uninitialized model object.
X (numpy.array, torch.tensor) – Domain points to be evaluated.
y (numpy.array, torch.tensor) – Response values corresponding to domain points X.
kfold (int) – Number of splits used in cross-validation.

Returns

Mean training and validation scores [train_RMSE, validation_RMSE, train_R^2, validation_R^2].

Return type

list