Initialization Schemes¶

class edbo.init_scheme.Init(method, batch_size, distance='gower')¶

Class represents different initialization schemes.

Methods for selecting initial points on a user defined grid.

__init__(method, batch_size, distance='gower')¶

Parameters

method (str) – Sampling method. Opions include: ‘random’, ‘PAM’, ‘k-means’, and ‘external’.
batch_size (int) – Number of points to select.
distance_metric (str) – Distance metric to be used with PAM. Options include: ‘gower’, ‘euclidean’, and ‘euclidean_square’.

run(obj, seed=None, export_path=None, visualize=False)¶

Run initialization algorithm on user defined domain.

Parameters

obj (edbo.objective) – Objective data container.
seed (None, int) – Random seed for random selection and initial choice of medoids or centroids.
export_path (None, str) – Path to export visualization if applicable.
visualize (bool) – If initialization method is set to ‘pam’ or ‘kmeans’ and visualize is set to True then a 2D embedding of the clustering results will be generated.

Returns

Selected domain points.

Return type

pandas.DataFrame

plot_choices(obj, export_path=None)¶

Plot low dimensional embeddingd of initialization points in domain.

Parameters

obj (edbo.objective) – Objective data container.
export_path (None, str) – Path to export visualization if applicable.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.rand(obj, batch_size, seed=None)¶

Random selection of points.

Parameters

obj (edbo.objective) – Objective data container.
batch_size (int) – Number of points to be selected.
seed (None, int) – Random seed.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.external_data(obj)¶

External data reader.

Parameters: obj (edbo.objective) – Objective data container.
Returns: Selected domain points.
Return type: pandas.DataFrame

edbo.init_scheme.PAM(obj, batch_size, distance='gower', visualize=True, seed=None, export_path=None)¶

Partitioning around medoids algorithm.

PAM function returns medoids of learned clusters.

PAM implimentated using pyclustering: https://pypi.org/project/pyclustering/

Parameters

obj (edbo.objective) – Objective data container.
batch_size (int) – Number of points to be selected. Batch size also determins the number of clusters. PAM returns the medoids.
distance (str) – Distance metric to be used in the PAM algorithm. Options include: ‘gower’, ‘euclidean’, and ‘euclidean_square’.
visualize (bool) – Visualize the learned clusters.
seed (None, int) – Random seed.
export_path (None, str) – Path to export cluster visualization SVG image.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.k_means(obj, batch_size, visualize=True, seed=None, export_path=None, n_init=1, return_clusters=False, return_centroids=False)¶

K-Means algorithm.

k_means function returns domain points closest to the means of learned clusters.

k-means clustering implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Parameters

obj (edbo.objective) – Objective data container.
batch_size (int) – Number of points to be selected. Batch size also determins the number of clusters. PAM returns the medoids.
visualize (bool) – Visualize the learned clusters.
seed (None, int) – Random seed.
export_path (None, str) – Path to export cluster visualization SVG image.

Returns

Selected domain points.

Return type

pandas.DataFrame