Utils
Clusters
Functions to manipulate and sort clusters
- aggregate_network_by_cluster(temporal_network, clusters, sort_clusters=None, output='averaged')[source]
Aggregates the temporal network over eacher cluster in a cluster set
- Parameters
temporal_network (phasik.TemporalNetwork) – Temporal network to aggregate
clusters (array of int) – Cluster labels
sort_clusters (bool) – If True, sort cluster labels based on ascending times
output ({‘weighted’, ‘averaged’, ‘binary’, ‘normalised’}, optional) – Determines the type of output edge weights
- Returns
aggregates – Dict each key is a cluster label and each value is a tuple of the form (networkx.Graph, list of time indices of cluster).
- Return type
dict
- cluster_sort(clusters, final_labels=None)[source]
Sorts an array of cluster labels in order of appearance, and returns the sorted array while leaving the original clusters unchanged.
- Parameters
clusters (numpy.ndarray) – An array of cluster labels.
final_labels (list or None, optional) – A list of final labels (as integers) to replace the original cluster labels, by default None.
- Returns
An array of cluster labels sorted in order of appearance. If final_labels is not None, it will return a list of final labels with the same length as clusters.
- Return type
numpy.ndarray or list
Examples
>>> clusters = np.array([2, 2, 2, 3, 3, 1, 1, 1]) >>> cluster_sort(clusters) array([1, 1, 1, 2, 2, 3, 3, 3])
>>> final_labels = [4, 5, 6] >>> cluster_sort(clusters, final_labels) [4, 4, 4, 5, 5, 6, 6, 6]
- convert_cluster_labels_to_dict(clusters)[source]
Returns dictionary where each key is a cluster label and each value is list of the time indices composing the cluster
- Parameters
clusters (list of int) – List of cluster labels
- rand_index_over_methods_and_sizes(valid_cluster_sets, reference_method='ward')[source]
Compute the Rand Index to compare any clustering method to a reference method, for all combinations of methods and number of clusters.
- Parameters
valid_cluster_sets (list) – List of tuples (cluster_object, method_name) representing the clustering object and the name of the clustering method used to obtain it.
reference_method (str, optional) – The name of the reference method to compare against. The default is “ward”.
- Returns
rand_scores – Array of dimension (n_sizes, n_methods) with Rand Index scores.
- Return type
ndarray
Notes
The Rand Index is a measure of the similarity between two clusterings. It is based on the number of pairs of samples that are assigned to the same or different clusters in the two clusterings. The adjusted Rand Index is a modification of the Rand Index that takes into account chance agreements.
Graphs
Utility functions for static graphs
- weighted_edges_as_df(network, keep_static=True, temporal_edges=None)[source]
Returns a pandas.Dataframe of weighted edges sorted by weight, from a networkx.Graph.
Columns are [‘i’, ‘j’, ‘weight’] and each row represents a different edge
- Parameters
network (networkx.Graph) – A network from which to get weighted edges
keep_static (bool or np.nan, optional) – If True (default), keep all edges. If False, discard the static edges (those not in temporal_edges). If np.nan, keep the static edges, but set their weight to np.nan. If keep_static is not False, temporal_edges must be provided.
temporal_edges (list of tuples) – List of edges for which there is temporal information.
- Return type
pandas.DataFrame
Notes
When the static network is derived from a temporal network, some edges might static (no temporal info) and have a default constant edge weight. That is when the arguments keep_static and temporal_edges are useful.
Paths
Functions to deal with system paths
- slugify(text, keep_characters=None)[source]
Turn any text into a string that can be used in a filename
- Parameters
text (str) – text to slugify
keep_characters (list of str) – characters in this iterable will be kept in the final string. Defaults to [‘_’]. Any other non-alphanumeric characters will be removed.
- Returns
slug
- Return type
str