BallMapper
- class pyballmapper.ballmapper.BallMapper(X: ndarray, eps, coloring_df=None, orbits=None, metric='euclidean', order=None, verbose=False)
Bases:
object
Create a BallMapper graph from vector array or distance matrix.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples)) – Data vectors, where n_samples is the number of samples and n_features is the number of features. For metric=’precomputed’, the expected shape of X is (n_samples, n_samples).
eps (float) – The radius of the balls.
orbits (list of lenght n_samples, default=None) – For each data points, contains a list of points in its orbit. Use it to create an Equivariant BallMapper.
coloring_df (pandas dataframe of shape (n_samples, n_coloring_function), default=None) – If defined, uses the add_coloring method to compute the average value of of each column for the points covered by each ball.
metric (str, or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is ‘precomputed’, X is assumed to be a distance matrix and must be square.
order (array-like of shape (n_samples, ), default=None) – The order in which to consider the data points in the greedy search for landmarks. Different ordering might lead to different BallMapper graphs. By defaults uses the order of X.
verbose (bool or string, default=False) – Enable verbose output. Set it to ‘tqdm’ to show a tqdm progressbar.
- Graph
The BallMapper graph. Each node correspond to a covering ball and has attributes: ‘landmark’ the id of the corresponding landmark point ‘points covered’ the ids of the points covered by the corresponding ball
- Type:
NetworkX Graph object
- eps
The input radius of the balls.
- Type:
float
- points_covered_by_landmarks
keys: landmarks ids values: list of ids of the points covered by the corresponding ball
- Type:
dict
Notes
https://arxiv.org/abs/1901.07410
- add_coloring(coloring_df, custom_function=<function mean>, custom_name=None, add_std=False)
Takes pandas dataframe and compute the average and standard deviation of each column for the subset of points colored by each ball. Add such values as attributes to each node in the BallMapper graph
- Parameters:
coloring_df (pandas dataframe of shape (n_samples, n_coloring_function)) –
custom_function (callable, optional) – a function to compute on the coloring_df columns, by default numpy.mean
custom_name (string, optional) – sets the attributes naming scheme, by default None, the attribute names will be the column names
add_std (bool, default=False) – Wheter to compute also the standard deviation on each ball
- color_by_variable(my_variable, my_palette, MIN_VALUE=inf, MAX_VALUE=-inf)
Colors the BallMapper graph using a specified variable. The add_coloring method needs to be called first. Automatically computes the min and max value for the colormap.
- Parameters:
my_variable (string) – the variable to color by
my_palette (matplotlib.colors.Colormap) – a valid colormap
MIN_VALUE (float, optional) – the value to be assigned to the lowest color in the cmap, by default np.inf
MAX_VALUE (float, optional) – the value to be assigned to the highest color in the cmap, by default -np.inf
- Returns:
the computed min and max values of my_variable on the BM nodes, useful to set the limits for a colorbar
- Return type:
MIN_VALUE, MAX_VALUE
- draw_networkx(coloring_variable=None, color_palette=matplotlib.colormaps.get_cmap, colorbar=False, colorbar_label=None, ax=None, MIN_VALUE=inf, MAX_VALUE=-inf, MIN_SCALE=100, MAX_SCALE=600, pos=None, **kwargs)
Wrapper around the networkx.draw_networkx method with colorbar support.
- Parameters:
coloring_variable (string, optional) – the variable to use for coloring the BM graph, by default None
color_palette (matplotlib.colors.Colormap, optional) – the coloring palette to use, by default cm.get_cmap(“Reds”)
colorbar (bool, optional) – the label on the colorbar’s long axis.
colorbar_label (str, optional) – whether to add a colorbar to the plot, by default False
ax (matplotlib.axes.Axes, optional) – the matplotlib ax where to plot the graph. If None, the current ax is used. By default None
MIN_VALUE (float, optional) – the value to be assigned to the lowest color in the cmap, by default np.inf
MAX_VALUE (float, optional) – the value to be assigned to the highest color in the cmap, by default -np.inf
MIN_SCALE (int, optional) – the minimum radius for the nodes, by default 100
MIN_SCALE – the maximum radius for the nodes, by default 100
pos (dictionary, optional) – A dictionary with nodes as keys and positions as values. If not specified a spring layout positioning will be computed. See networkx.drawing.layout for functions that compute node positions. By default None
- Returns:
the matplotlib ax
- Return type:
ax
- filter_by(list_of_points)
return a copy of the BallMapper object with only the nodes covering a subset of points
- Parameters:
list_of_points (list) – list of the subset of points to keep
- Returns:
the filtered BallMapper graph
- Return type:
- points_and_balls()
returns a DataFrame with the points_covered_by_landmarks information
- Return type:
pandas.DataFrame