Preprocessing¶
Graph Cuts¶
Constants¶
- graspologic.preprocessing.LARGER_THAN_INCLUSIVE¶
Cut any edge or node > the cut_threshold
- graspologic.preprocessing.LARGER_THAN_EXCLUSIVE¶
Cut any edge or node >= the cut_threshold
- graspologic.preprocessing.SMALLER_THAN_INCLUSIVE¶
Cut any edge or node < the cut_threshold
- graspologic.preprocessing.SMALLER_THAN_EXCLUSIVE¶
Cut any edge or node <= the cut_threshold
Classes¶
- class graspologic.preprocessing.DefinedHistogram[source]¶
Contains the histogram and the edges of the bins in the histogram. The bin_edges will have a length 1 greater than the histogram, as it defines the minimal and maximal edges as well as each edge in between.
Create new instance of DefinedHistogram(histogram, bin_edges)
- histogram: ndarray¶
Alias for field number 0
- bin_edges: ndarray¶
Alias for field number 1
- static __new__(_cls, histogram, bin_edges)¶
Create new instance of DefinedHistogram(histogram, bin_edges)
- Parameters:
histogram (ndarray)
bin_edges (ndarray)
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=sys.maxsize, /)¶
Return first index of value.
Raises ValueError if the value is not present.
Functions¶
- graspologic.preprocessing.cut_edges_by_weight(graph, cut_threshold, cut_process, weight_attribute='weight', prune_isolates=False)[source]¶
Thresholds edges (removing them from the graph and returning a copy) by weight.
- Parameters:
- graphUnion[nx.Graph, nx.DiGraph]
The graph that will be copied and pruned.
- cut_thresholdUnion[int, float]
The threshold for making cuts based on weight.
- cut_processstr
Describes how we should make the cut; cut all edges larger or smaller than the cut_threshold, and whether exclusive or inclusive. Allowed values are
larger_than_inclusive
larger_than_exclusive
smaller_than_inclusive
smaller_than_exclusive
- weight_attributestr
The weight attribute name in the edge's data dictionary. Default is weight.
- prune_isolatesbool
If true, remove any vertex that no longer has an edge. Note that this only prunes vertices which have edges to be pruned; any isolate vertex prior to any edge cut will be retained.
- Returns:
- Union[nx.Graph, nx.DiGraph]
Pruned copy of the same type of graph provided
- Parameters:
- Return type:
Notes
Edges without a weight_attribute field will be excluded from these cuts. Enable logging to view any messages about edges without weights.
- graspologic.preprocessing.cut_vertices_by_betweenness_centrality(graph, cut_threshold, cut_process, num_random_samples=None, normalized=True, weight_attribute='weight', include_endpoints=False, random_seed=None)[source]¶
Given a graph and a cut_threshold and a cut_process, return a copy of the graph with the vertices outside of the cut_threshold.
The betweenness centrality calculation can take advantage of networkx' implementation of randomized sampling by providing num_random_samples (or k, in networkx betweenness_centrality nomenclature).
- Parameters:
- graphUnion[nx.Graph, nx.DiGraph]
The graph that will be copied and pruned.
- cut_thresholdUnion[int, float]
The threshold for making cuts based on weight.
- cut_processstr
Describes how we should make the cut; cut all edges larger or smaller than the cut_threshold, and whether exclusive or inclusive. Allowed values are
larger_than_inclusive
larger_than_exclusive
smaller_than_inclusive
smaller_than_exclusive
- num_random_samplesOptional[int]
Use num_random_samples for vertex samples to estimate betweenness. num_random_samples should be <= len(graph.nodes). The larger num_random_samples is, the better the approximation. Default is
None
.- normalizedbool
If True the betweenness values are normalized by \(2/((n-1)(n-2))\) for undirected graphs, and \(1/((n-1)(n-2))\) for directed graphs where n is the number of vertices in the graph. Default is
True
- weight_attributeOptional[str]
If None, all edge weights are considered equal. Otherwise holds the name of the edge attribute used as weight. Default is
weight
- include_endpointsbool
If True include the endpoints in the shortest path counts. Default is
False
- random_seedOptional[Union[int, random.Random, np.random.RandomState]]
Random seed or preconfigured random instance to be used for selecting random samples. Only used if num_random_samples is set. None will generate a new random state. Specifying a random state will provide consistent results between runs.
- Returns:
- Union[nx.Graph, nx.DiGraph]
Pruned copy of the same type of graph provided
- Parameters:
- Return type:
- graspologic.preprocessing.cut_vertices_by_degree_centrality(graph, cut_threshold, cut_process)[source]¶
Given a graph and a cut_threshold and a cut_process, return a copy of the graph with the vertices outside of the cut_threshold.
- Parameters:
- graphUnion[nx.Graph, nx.DiGraph]
The graph that will be copied and pruned.
- cut_thresholdUnion[int, float]
The threshold for making cuts based on weight.
- cut_processstr
Describes how we should make the cut; cut all edges larger or smaller than the cut_threshold, and whether exclusive or inclusive. Allowed values are
larger_than_inclusive
larger_than_exclusive
smaller_than_inclusive
smaller_than_exclusive
- Returns:
- Union[nx.Graph, nx.DiGraph]
Pruned copy of the same type of graph provided
- Parameters:
- Return type:
- graspologic.preprocessing.histogram_betweenness_centrality(graph, bin_directive=10, num_random_samples=None, normalized=True, weight_attribute='weight', include_endpoints=False, random_seed=None)[source]¶
Generates a histogram of the vertex betweenness centrality of the provided graph. Histogram function is fundamentally proxied through to numpy's histogram function, and bin selection follows
numpy.histogram()
processes.The betweenness centrality calculation can take advantage of networkx' implementation of randomized sampling by providing num_random_samples (or
k
, in networkx betweenness_centrality nomenclature).- Parameters:
- graphUnion[nx.Graph, nx.DiGraph]
The graph. No changes will be made to it.
- bin_directiveUnion[int, List[Union[float, int]], numpy.ndarray, str]
Is passed directly through to numpy's "histogram" (and thus, "histogram_bin_edges") functions.
See:
numpy.histogram_bin_edges()
In short: if an int is provided, we use
bin_directive
number of equal range bins.If a sequence is provided, these bin edges will be used and can be sized to whatever size you prefer
Note that the
numpy.ndarray
should be ndim=1 and the values should be float or int.- num_random_samplesOptional[int]
Use num_random_samples for vertex samples to estimate betweeness. num_random_samples should be <= len(graph.nodes). The larger num_random_samples is, the better the approximation. Default is
None
.- normalizedbool
If True the betweenness values are normalized by \(2/((n-1)(n-2))\) for undirected graphs, and \(1/((n-1)(n-2))\) for directed graphs where n is the number of vertices in the graph. Default is
True
- weight_attributeOptional[str]
If None, all edge weights are considered equal. Otherwise holds the name of the edge attribute used as weight. Default is
weight
- include_endpointsbool
If True include the endpoints in the shortest path counts. Default is
False
- random_seedOptional[Union[int, random.Random, np.random.RandomState]]
Random seed or preconfigured random instance to be used for selecting random samples. Only used if num_random_samples is set. None will generate a new random state. Specifying a random state will provide consistent results between runs.
- Returns:
DefinedHistogram
A named tuple that contains the histogram and the bin_edges used in the histogram
- Parameters:
- Return type:
- graspologic.preprocessing.histogram_degree_centrality(graph, bin_directive=10)[source]¶
Generates a histogram of the vertex degree centrality of the provided graph. Histogram function is fundamentally proxied through to numpy's histogram function, and bin selection follows
numpy.histogram()
processes.- Parameters:
- graphUnion[nx.Graph, nx.DiGraph]
The graph. No changes will be made to it.
- bin_directiveUnion[int, List[Union[float, int]], numpy.ndarray, str]
Is passed directly through to numpy's "histogram" (and thus, "histogram_bin_edges") functions.
See:
numpy.histogram_bin_edges()
In short: if an int is provided, we use
bin_directive
number of equal range bins.If a sequence is provided, these bin edges will be used and can be sized to whatever size you prefer
Note that the
numpy.ndarray
should be ndim=1 and the values should be float or int.
- Returns:
DefinedHistogram
A named tuple that contains the histogram and the bin_edges used in the histogram
- Parameters:
- Return type:
- graspologic.preprocessing.histogram_edge_weight(graph, bin_directive=10, weight_attribute='weight')[source]¶
Generates a histogram of the edge weights of the provided graph. Histogram function is fundamentally proxied through to numpy's histogram function, and bin selection follows
numpy.histogram()
processes.- Parameters:
- graphnx.Graph
The graph. No changes will be made to it.
- bin_directiveUnion[int, List[Union[float, int]], numpy.ndarray, str]
Is passed directly through to numpy's "histogram" (and thus, "histogram_bin_edges") functions.
See:
numpy.histogram_bin_edges()
In short: if an int is provided, we use
bin_directive
number of equal range bins.If a sequence is provided, these bin edges will be used and can be sized to whatever size you prefer
Note that the
numpy.ndarray
should be ndim=1 and the values should be float or int.- weight_attributestr
The weight attribute name in the data dictionary. Default is weight.
- Returns:
DefinedHistogram
A named tuple that contains the histogram and the bin_edges used in the histogram
- Parameters:
- Return type:
Notes
Edges without a weight_attribute field will be excluded from this histogram. Enable logging to view any messages about edges without weights.