Initialization Schemes¶
-
class
edbo.init_scheme.
Init
(method, batch_size, distance='gower')¶ Class represents different initialization schemes.
Methods for selecting initial points on a user defined grid.
-
__init__
(method, batch_size, distance='gower')¶ - Parameters
method (str) – Sampling method. Opions include: ‘random’, ‘PAM’, ‘k-means’, and ‘external’.
batch_size (int) – Number of points to select.
distance_metric (str) – Distance metric to be used with PAM. Options include: ‘gower’, ‘euclidean’, and ‘euclidean_square’.
-
run
(obj, seed=None, export_path=None, visualize=False)¶ Run initialization algorithm on user defined domain.
- Parameters
obj (edbo.objective) – Objective data container.
seed (None, int) – Random seed for random selection and initial choice of medoids or centroids.
export_path (None, str) – Path to export visualization if applicable.
visualize (bool) – If initialization method is set to ‘pam’ or ‘kmeans’ and visualize is set to True then a 2D embedding of the clustering results will be generated.
- Returns
Selected domain points.
- Return type
pandas.DataFrame
-
plot_choices
(obj, export_path=None)¶ Plot low dimensional embeddingd of initialization points in domain.
- Parameters
obj (edbo.objective) – Objective data container.
export_path (None, str) – Path to export visualization if applicable.
- Returns
Selected domain points.
- Return type
pandas.DataFrame
-
-
edbo.init_scheme.
rand
(obj, batch_size, seed=None)¶ Random selection of points.
- Parameters
obj (edbo.objective) – Objective data container.
batch_size (int) – Number of points to be selected.
seed (None, int) – Random seed.
- Returns
Selected domain points.
- Return type
pandas.DataFrame
-
edbo.init_scheme.
external_data
(obj)¶ External data reader.
- Parameters
obj (edbo.objective) – Objective data container.
- Returns
Selected domain points.
- Return type
pandas.DataFrame
-
edbo.init_scheme.
PAM
(obj, batch_size, distance='gower', visualize=True, seed=None, export_path=None)¶ Partitioning around medoids algorithm.
PAM function returns medoids of learned clusters.
PAM implimentated using pyclustering: https://pypi.org/project/pyclustering/
- Parameters
obj (edbo.objective) – Objective data container.
batch_size (int) – Number of points to be selected. Batch size also determins the number of clusters. PAM returns the medoids.
distance (str) – Distance metric to be used in the PAM algorithm. Options include: ‘gower’, ‘euclidean’, and ‘euclidean_square’.
visualize (bool) – Visualize the learned clusters.
seed (None, int) – Random seed.
export_path (None, str) – Path to export cluster visualization SVG image.
- Returns
Selected domain points.
- Return type
pandas.DataFrame
-
edbo.init_scheme.
k_means
(obj, batch_size, visualize=True, seed=None, export_path=None, n_init=1, return_clusters=False, return_centroids=False)¶ K-Means algorithm.
k_means function returns domain points closest to the means of learned clusters.
k-means clustering implemented using scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
- Parameters
obj (edbo.objective) – Objective data container.
batch_size (int) – Number of points to be selected. Batch size also determins the number of clusters. PAM returns the medoids.
visualize (bool) – Visualize the learned clusters.
seed (None, int) – Random seed.
export_path (None, str) – Path to export cluster visualization SVG image.
- Returns
Selected domain points.
- Return type
pandas.DataFrame