ClusterSet

class ClusterSet(clusters, times, linkage, distance_matrix, distance_metric, cluster_method, n_clusters_max, n_max_type)[source]

Base class for a set of clusters (partition) of timepoints

Variables
  • clusters (list of int) – Clusters as a list of cluster labels

  • times (list of (int or float)) – Sorted list of time associated to each clustered snapshot

  • n_clusters (int) – Number of clusters in the cluster set (partition)

  • cluster_method (float) – Method used to cluster the snapshots . Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’

  • n_max_type (float) – Method that was used to determine when to stop clustering when creating this cluster set. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).

  • n_max (int) – Value corresponding to the n_max_type described above.

  • distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.

Parameters
  • clusters (list of int) – Clusters as a list of cluster labels

  • times (list of (int or float)) – Sorted list of time associated to each clustered snapshot

  • linkage – Linkage of the clustering

  • distance_matrix (phasik.DistanceMatrix) – Distance matrix from which the clusters were computed

  • cluster_method (float) – Method used to cluster the snapshots . Examples : k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’

  • n_max_type (float) – Method that was used to determine when to stop clustering when creating this cluster set. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).

  • n_clusters_max (int) – Value corresponding to the n_max_type described above.

  • distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.

property cluster_method

Returns the clustering method used to cluster the temporal data

property clusters

Returns the clusters, i.e. a list of cluster labels (int)

property distance_metric

Returns the distance metric used to compute the distance between snapshots, e.g. ‘euclidean’

distance_threshold()[source]

Calculate the distance at which clustering stops

Returns

Smallest number d such that the distance between any two clusters is < d.

Return type

int

classmethod from_distance_matrix(distance_matrix, n_max_type, n_clusters_max, cluster_method)[source]

Generates a ClusterSet from a distance matrix

Parameters
  • distance_matrix (phasik.DistanceMatrix) – Distance matrix from which to cluster

  • n_max_type (str) – The method that determines when to stop clustering. For example, cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).

  • n_clusters_max (int) – Value corresponding to the n_max_type described above.

  • cluster_method (str) – Clustering method used to cluster the temporal network snapshots. Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’

Return type

ClusterSet

classmethod from_temporal_network(temporal_network, distance_metric, cluster_method, n_max_type, n_clusters_max)[source]

Generates a ClusterSet from a temporal network

Parameters
  • temporal_network (TemporalNetwork) – Temporal network from which to compute the distance matrix

  • distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.

  • cluster_method (str) – Clustering method used to cluster the temporal network snapshots. Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’

  • n_max_type (str) – The method that determines when to stop clustering. For example, cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).

  • n_clusters_max (int) – Value corresponding to the n_max_type described above.

Return type

ClusterSet

property n_max

Returns the value corresponding to the n_max_type described above.

property n_max_type

Returns the method (str) that determines when to stop clustering

plot(colors=None, cmap='tab10', vmin=None, vmax=None, y_height=0, ax=None, **kwargs)[source]

Visualize the clusters in cluster_set.

For each time point, a marker is drawn with a color corresponding to the cluster to which it belongs.

Parameters
  • colors (list of int, optional) – If None (default), cluster label 0 is assigned its automatic color “C0” and so on. If colors is a list (e.g. [3,1,2]), it relabels the clusters in that order and assigns them the new corresponding colors.

  • cmap (colormap, optional) – Desired colormap (default ‘tab10’).

  • vmin/vmax (float, optional) – Min and max values to use for the color mapping. If None (default), computed from the data in colors.

  • y_height (int or float, optional) – Vertical value at which to draw the markers (default 0). If a single cluster is drawn this value does not matter.

  • ax (matplotlib.Axes, optional) – Axes on which to plot

  • **kwargs – Other parameters to pass to matplotlib’s scatter.

Return type

None

plot_dendrogram(ax=None, distance_threshold=None, leaf_rotation=90, leaf_font_size=6)[source]

Plot this cluster set as a dendrogram

Parameters
  • ax (matplotlib.Axes, optional) – Axes on which to plot

  • distance_threshold (float, optional) – Threshold at which to draw a horizontal line and above which to use different colors for different branches.

  • leaf_rotation (int or float, optional) – Rotation to apply to the x-axis (leaf) labels (default 90)

  • leaf_font_size (int or str, optional) – Desired size of the x-axis (leaf) labels (default 6)

Return type

None

plot_silhouette_samples(ax=None)[source]

Plot the silhouette samples from this cluster set

Parameters

ax (matplotlib.Axes, optional) – Axes on which to plot

Return type

None

property times

Returns the list of times corresponding to datapoints clustered

ClusterSets

class ClusterSets(cluster_sets, n_max_type, ns_max)[source]

Base class for sets of clusters (partition) of timepoints

Variables
  • cluster_sets (iterable of phasik.ClusterSet) – List of ClusterSets

  • clusters (numpy array of int) – Summary array of the cluster labels, with dim (len(ns_max), len(times))

  • n_clusters (list of int) – Number of clusters in the cluster set (partition)

  • times (list of (int or float)) – Sorted list of time associated to each clustered snapshot

  • distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.

  • n_max_type (float) – Method that was used to determine when to stop clustering when creating this cluster set. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).

  • ns_max (list of int) – List of values corresponding to the n_max_type described above, in other words, list of numbers clusters to be computed. The number of elements in this list is the number of ClusterSet computed.

  • silhouettes_average (numpy array) – Value of average silouette for each clustering

Parameters
  • cluster_sets (iterable of ClusterSet)

  • n_max_type (str) – Method that was used to determine when to stop clustering when creating these cluster sets. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’)

  • ns_max (list of int) – List of values corresponding to the n_max_type described above, in other words, list of numbers clusters to be computed. The number of elements in this list is the number of ClusterSet computed.

property clusters_sets

Returns the list of ClusterSet

classmethod from_distance_matrix(distance_matrix, n_max_type, ns_clusters_max, cluster_method)[source]

Generates ClusterSets from a distance matrix

Parameters
  • distance_matrix (phasik.DistanceMatrix) – Distance matrix from which to cluster

  • n_max_type (str) – The method that determines when to stop clustering. For example, cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).

  • ns_clusters_max (list of int) – List of values corresponding to the n_max_type described above, in other words, list of numbers clusters to be computed. The number of elements in this list is the number of ClusterSet computed.

  • cluster_method (str) – Clustering method used to cluster the temporal network snapshots. Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’

Return type

ClusterSets

plot(axs=None, coloring='consistent', translation=None, with_silhouettes=False, with_n_clusters=False)[source]

Plots these cluster sets as a scatter graph

Parameters
  • ax (matplotlib.Axes, optional) – Axes on which to plot

  • coloring ({‘ascending’, ‘consistent’, None}) – Method for consistent coloring. Default: “consistent”.

  • translation (dict, optional) – Dictionary with old labels as keys and new labels as values. If None (default), has no effect. For example {1: 2, 2: 3, 3: 1}. It is applied after the order relabling from method.

  • with_silhouettes (bool) – If True, also plot the average silhouettes on a 2nd axis. Defaults to False.

  • with_n_clusters (bool) – If True, also plot the actual number of clusters on a 3rd axis. Defaults to False.

Return type

None

plot_and_format_with_average_silhouettes(axs, events, phases, time_ticks=None, coloring='consistent')[source]

Plot and format these cluster sets as a scatter graph, along with the average silhouettes and cluster set sizes

Our pattern generally has been to leave all formatting in the jupyter notebooks, but this method is used by several different notebooks, so it makes sense to put it somewhere common.

Parameters
  • axs (list of matplotlib.Axes) – Axes on which to plot; should be an indexable object with at least three items

  • events – Any events that should be plotted on the scatter graph

  • phases – Any phases that should be plotted on the scatter graph

  • time_ticks (list or array) – The ticks that should be displayed along the x-axis (time axis)

  • coloring ({‘ascending’, ‘consistent’, None}) – Method for consistent coloring. Default: “consistent”.

Return type

None

plot_silhouette_samples(axs, coloring='consistent')[source]

Plot the average silhouettes across this range of cluster sets

Parameters
  • axs (list of matplotlib.Axes) – Axes on which to plot; should be an iterable object with at least as many items as there are cluster sets in this class.

  • coloring ({‘ascending’, ‘consistent’, None}) – Method for consistent coloring. Default: “consistent”.

Return type

None