Initialization Schemes

class edbo.init_scheme.Init(method, batch_size, distance='gower')

Class represents different initialization schemes.

Methods for selecting initial points on a user defined grid.

__init__(method, batch_size, distance='gower')
Parameters
  • method (str) – Sampling method. Opions include: ‘random’, ‘PAM’, ‘k-means’, and ‘external’.

  • batch_size (int) – Number of points to select.

  • distance_metric (str) – Distance metric to be used with PAM. Options include: ‘gower’, ‘euclidean’, and ‘euclidean_square’.

run(obj, seed=None, export_path=None, visualize=False)

Run initialization algorithm on user defined domain.

Parameters
  • obj (edbo.objective) – Objective data container.

  • seed (None, int) – Random seed for random selection and initial choice of medoids or centroids.

  • export_path (None, str) – Path to export visualization if applicable.

  • visualize (bool) – If initialization method is set to ‘pam’ or ‘kmeans’ and visualize is set to True then a 2D embedding of the clustering results will be generated.

Returns

Selected domain points.

Return type

pandas.DataFrame

plot_choices(obj, export_path=None)

Plot low dimensional embeddingd of initialization points in domain.

Parameters
  • obj (edbo.objective) – Objective data container.

  • export_path (None, str) – Path to export visualization if applicable.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.rand(obj, batch_size, seed=None)

Random selection of points.

Parameters
  • obj (edbo.objective) – Objective data container.

  • batch_size (int) – Number of points to be selected.

  • seed (None, int) – Random seed.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.external_data(obj)

External data reader.

Parameters

obj (edbo.objective) – Objective data container.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.PAM(obj, batch_size, distance='gower', visualize=True, seed=None, export_path=None)

Partitioning around medoids algorithm.

PAM function returns medoids of learned clusters.

PAM implimentated using pyclustering: https://pypi.org/project/pyclustering/

Parameters
  • obj (edbo.objective) – Objective data container.

  • batch_size (int) – Number of points to be selected. Batch size also determins the number of clusters. PAM returns the medoids.

  • distance (str) – Distance metric to be used in the PAM algorithm. Options include: ‘gower’, ‘euclidean’, and ‘euclidean_square’.

  • visualize (bool) – Visualize the learned clusters.

  • seed (None, int) – Random seed.

  • export_path (None, str) – Path to export cluster visualization SVG image.

Returns

Selected domain points.

Return type

pandas.DataFrame

edbo.init_scheme.k_means(obj, batch_size, visualize=True, seed=None, export_path=None, n_init=1, return_clusters=False, return_centroids=False)

K-Means algorithm.

k_means function returns domain points closest to the means of learned clusters.

k-means clustering implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Parameters
  • obj (edbo.objective) – Objective data container.

  • batch_size (int) – Number of points to be selected. Batch size also determins the number of clusters. PAM returns the medoids.

  • visualize (bool) – Visualize the learned clusters.

  • seed (None, int) – Random seed.

  • export_path (None, str) – Path to export cluster visualization SVG image.

Returns

Selected domain points.

Return type

pandas.DataFrame