Utils

Clusters

Functions to manipulate and sort clusters

aggregate_network_by_cluster(temporal_network, clusters, sort_clusters=None, output='averaged')[source]

Aggregates the temporal network over eacher cluster in a cluster set

Parameters

temporal_network (phasik.TemporalNetwork) – Temporal network to aggregate
clusters (array of int) – Cluster labels of length that is the number of time points in the temporal network.
sort_clusters (bool) – If True, sort cluster labels based on ascending times
output ({‘weighted’, ‘averaged’, ‘binary’, ‘normalised’}, optional) – Determines the type of output edge weights

Returns

aggregates – Dict each key is a cluster label and each value is a tuple of the form (networkx.Graph, list of time indices of cluster).

Return type

dict

Examples

>>> import phasik as pk
>>> clusters = [1, 1, 1, 2, 2, 3]
>>> pk.aggregate_network_by_cluster(temporal_network, clusters, output="averaged")
{1: (<networkx.classes.graph.Graph at 0x177665df0>, [0, 1, 2]),
 2: (<networkx.classes.graph.Graph at 0x177668580>, [3, 4]),
 3: (<networkx.classes.graph.Graph at 0x177668e20>, [5])}

cluster_sort(clusters, final_labels=None)[source]

Sorts an array of cluster labels in order of appearance, and returns the sorted array while leaving the original clusters unchanged.

Parameters

clusters (numpy.ndarray) – An array of cluster labels.
final_labels (list or None, optional) – A list of final labels (as integers) to replace the original cluster labels, by default None.

Returns

An array of cluster labels sorted in order of appearance. If final_labels is not None, it will return a list of final labels with the same length as clusters.

Return type

numpy.ndarray or list

Examples

>>> clusters = np.array([2, 2, 2, 3, 3, 1, 1, 1])
>>> cluster_sort(clusters)
array([1, 1, 1, 2, 2, 3, 3, 3])
>>> final_labels = [4, 5, 6]
>>> cluster_sort(clusters, final_labels)
[4, 4, 4, 5, 5, 6, 6, 6]

convert_cluster_labels_to_dict(clusters)[source]

Returns dictionary where each key is a cluster label and each value is list of the time indices composing the cluster.

Parameters: clusters (list of int) – List of cluster labels
Returns: cluster_times
Return type: dict

Examples

>>> import phasik as pk
>>> pk.convert_cluster_labels_to_dict([1, 1, 1, 2, 2, 3])
{1: [0, 1, 2], 2: [3, 4], 3: [5]}

rand_index_over_methods_and_sizes(valid_cluster_sets, reference_method='ward')[source]

Compute the Rand Index to compare any clustering method to a reference method, for all combinations of methods and number of clusters.

Parameters

valid_cluster_sets (list) – List of tuples (cluster_object, method_name) representing the clustering object and the name of the clustering method used to obtain it.
reference_method (str, optional) – The name of the reference method to compare against. The default is “ward”.

Returns

rand_scores – Array of dimension (n_sizes, n_methods) with Rand Index scores.

Return type

ndarray

Notes

The Rand Index is a measure of the similarity between two clusterings. It is based on the number of pairs of samples that are assigned to the same or different clusters in the two clusterings. The adjusted Rand Index is a modification of the Rand Index that takes into account chance agreements.

Examples

>>> import phasik as pk
>>> clustering_methods = ["k_means", "centroid","average", "ward"]
>>> valid_cluster_sets = []
>>> for clustering_method in clustering_methods:
>>>     distance_matrix = pk.DistanceMatrix.from_temporal_network(
>>>         temporal_network, "euclidean"
>>>     )
>>>     cluster_sets = pk.ClusterSets.from_distance_matrix(
>>>         distance_matrix, "maxclust",  range(2, 12), clustering_method
>>>     )
>>>     valid_cluster_sets.append((cluster_sets, clustering_method))
>>> pk.rand_index_over_methods_and_sizes(valid_cluster_sets, reference_method="ward")

Graphs

Utility functions for static graphs

graph_size_info(graph)[source]: Return basic size info on about graph

weighted_edges_as_df(network, keep_static=True, temporal_edges=None)[source]

Returns a pandas.Dataframe of weighted edges sorted by weight, from a networkx.Graph.

Columns are [‘i’, ‘j’, ‘weight’] and each row represents a different edge

Parameters

network (networkx.Graph) – A network from which to get weighted edges
keep_static (bool or np.nan, optional) – If True (default), keep all edges. If False, discard the static edges (those not in temporal_edges). If np.nan, keep the static edges, but set their weight to np.nan. If keep_static is not False, temporal_edges must be provided.
temporal_edges (list of tuples) – List of edges for which there is temporal information.

Return type

pandas.DataFrame

Notes

When the static network is derived from a temporal network, some edges might static (no temporal info) and have a default constant edge weight. That is when the arguments keep_static and temporal_edges are useful.

Paths

Functions to deal with system paths

slugify(text, keep_characters=None)[source]

Turn any text into a string that can be used in a filename

Parameters

text (str) – text to slugify
keep_characters (list of str) – characters in this iterable will be kept in the final string. Defaults to [‘_’]. Any other non-alphanumeric characters will be removed.

Returns

slug

Return type

str