Utils

Clusters

Functions to manipulate and sort clusters

aggregate_network_by_cluster(temporal_network, clusters, sort_clusters=None, output='averaged')[source]

Aggregates the temporal network over eacher cluster in a cluster set

Parameters
  • temporal_network (phasik.TemporalNetwork) – Temporal network to aggregate

  • clusters (array of int) – Cluster labels of length that is the number of time points in the temporal network.

  • sort_clusters (bool) – If True, sort cluster labels based on ascending times

  • output ({‘weighted’, ‘averaged’, ‘binary’, ‘normalised’}, optional) – Determines the type of output edge weights

Returns

aggregates – Dict each key is a cluster label and each value is a tuple of the form (networkx.Graph, list of time indices of cluster).

Return type

dict

Examples

>>> import phasik as pk
>>> clusters = [1, 1, 1, 2, 2, 3]
>>> pk.aggregate_network_by_cluster(temporal_network, clusters, output="averaged")
{1: (<networkx.classes.graph.Graph at 0x177665df0>, [0, 1, 2]),
 2: (<networkx.classes.graph.Graph at 0x177668580>, [3, 4]),
 3: (<networkx.classes.graph.Graph at 0x177668e20>, [5])}
cluster_sort(clusters, final_labels=None)[source]

Sorts an array of cluster labels in order of appearance, and returns the sorted array while leaving the original clusters unchanged.

Parameters
  • clusters (numpy.ndarray) – An array of cluster labels.

  • final_labels (list or None, optional) – A list of final labels (as integers) to replace the original cluster labels, by default None.

Returns

An array of cluster labels sorted in order of appearance. If final_labels is not None, it will return a list of final labels with the same length as clusters.

Return type

numpy.ndarray or list

Examples

>>> clusters = np.array([2, 2, 2, 3, 3, 1, 1, 1])
>>> cluster_sort(clusters)
array([1, 1, 1, 2, 2, 3, 3, 3])
>>> final_labels = [4, 5, 6]
>>> cluster_sort(clusters, final_labels)
[4, 4, 4, 5, 5, 6, 6, 6]
convert_cluster_labels_to_dict(clusters)[source]

Returns dictionary where each key is a cluster label and each value is list of the time indices composing the cluster.

Parameters

clusters (list of int) – List of cluster labels

Returns

cluster_times

Return type

dict

Examples

>>> import phasik as pk
>>> pk.convert_cluster_labels_to_dict([1, 1, 1, 2, 2, 3])
{1: [0, 1, 2], 2: [3, 4], 3: [5]}
rand_index_over_methods_and_sizes(valid_cluster_sets, reference_method='ward')[source]

Compute the Rand Index to compare any clustering method to a reference method, for all combinations of methods and number of clusters.

Parameters
  • valid_cluster_sets (list) – List of tuples (cluster_object, method_name) representing the clustering object and the name of the clustering method used to obtain it.

  • reference_method (str, optional) – The name of the reference method to compare against. The default is “ward”.

Returns

rand_scores – Array of dimension (n_sizes, n_methods) with Rand Index scores.

Return type

ndarray

Notes

The Rand Index is a measure of the similarity between two clusterings. It is based on the number of pairs of samples that are assigned to the same or different clusters in the two clusterings. The adjusted Rand Index is a modification of the Rand Index that takes into account chance agreements.

Examples

>>> import phasik as pk
>>> clustering_methods = ["k_means", "centroid","average", "ward"]
>>> valid_cluster_sets = []
>>> for clustering_method in clustering_methods:
>>>     distance_matrix = pk.DistanceMatrix.from_temporal_network(
>>>         temporal_network, "euclidean"
>>>     )
>>>     cluster_sets = pk.ClusterSets.from_distance_matrix(
>>>         distance_matrix, "maxclust",  range(2, 12), clustering_method
>>>     )
>>>     valid_cluster_sets.append((cluster_sets, clustering_method))
>>> pk.rand_index_over_methods_and_sizes(valid_cluster_sets, reference_method="ward")

Graphs

Utility functions for static graphs

graph_size_info(graph)[source]

Return basic size info on about graph

weighted_edges_as_df(network, keep_static=True, temporal_edges=None)[source]

Returns a pandas.Dataframe of weighted edges sorted by weight, from a networkx.Graph.

Columns are [‘i’, ‘j’, ‘weight’] and each row represents a different edge

Parameters
  • network (networkx.Graph) – A network from which to get weighted edges

  • keep_static (bool or np.nan, optional) – If True (default), keep all edges. If False, discard the static edges (those not in temporal_edges). If np.nan, keep the static edges, but set their weight to np.nan. If keep_static is not False, temporal_edges must be provided.

  • temporal_edges (list of tuples) – List of edges for which there is temporal information.

Return type

pandas.DataFrame

Notes

When the static network is derived from a temporal network, some edges might static (no temporal info) and have a default constant edge weight. That is when the arguments keep_static and temporal_edges are useful.

Paths

Functions to deal with system paths

slugify(text, keep_characters=None)[source]

Turn any text into a string that can be used in a filename

Parameters
  • text (str) – text to slugify

  • keep_characters (list of str) – characters in this iterable will be kept in the final string. Defaults to [‘_’]. Any other non-alphanumeric characters will be removed.

Returns

slug

Return type

str