BallMapper

class pyballmapper.ballmapper.BallMapper(X: ndarray, eps, coloring_df=None, orbits=None, metric='euclidean', order=None, verbose=False)

Bases: object

Create a BallMapper graph from vector array or distance matrix.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples)) – Data vectors, where n_samples is the number of samples and n_features is the number of features. For metric=’precomputed’, the expected shape of X is (n_samples, n_samples).

  • eps (float) – The radius of the balls.

  • orbits (list of lenght n_samples, default=None) – For each data points, contains a list of points in its orbit. Use it to create an Equivariant BallMapper.

  • coloring_df (pandas dataframe of shape (n_samples, n_coloring_function), default=None) – If defined, uses the add_coloring method to compute the average value of of each column for the points covered by each ball.

  • metric (str, or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is ‘precomputed’, X is assumed to be a distance matrix and must be square.

  • order (array-like of shape (n_samples, ), default=None) – The order in which to consider the data points in the greedy search for landmarks. Different ordering might lead to different BallMapper graphs. By defaults uses the order of X.

  • verbose (bool or string, default=False) – Enable verbose output. Set it to ‘tqdm’ to show a tqdm progressbar.

Graph

The BallMapper graph. Each node correspond to a covering ball and has attributes: ‘landmark’ the id of the corresponding landmark point ‘points covered’ the ids of the points covered by the corresponding ball

Type:

NetworkX Graph object

eps

The input radius of the balls.

Type:

float

points_covered_by_landmarks

keys: landmarks ids values: list of ids of the points covered by the corresponding ball

Type:

dict

Notes

https://arxiv.org/abs/1901.07410

add_coloring(coloring_df, custom_function=<function mean>, custom_name=None, add_std=False)

Takes pandas dataframe and compute the average and standard deviation of each column for the subset of points colored by each ball. Add such values as attributes to each node in the BallMapper graph

Parameters:
  • coloring_df (pandas dataframe of shape (n_samples, n_coloring_function)) –

  • custom_function (callable, optional) – a function to compute on the coloring_df columns, by default numpy.mean

  • custom_name (string, optional) – sets the attributes naming scheme, by default None, the attribute names will be the column names

  • add_std (bool, default=False) – Wheter to compute also the standard deviation on each ball

color_by_variable(my_variable, my_palette, MIN_VALUE=inf, MAX_VALUE=-inf)

Colors the BallMapper graph using a specified variable. The add_coloring method needs to be called first. Automatically computes the min and max value for the colormap.

Parameters:
  • my_variable (string) – the variable to color by

  • my_palette (matplotlib.colors.Colormap) – a valid colormap

  • MIN_VALUE (float, optional) – the value to be assigned to the lowest color in the cmap, by default np.inf

  • MAX_VALUE (float, optional) – the value to be assigned to the highest color in the cmap, by default -np.inf

Returns:

the computed min and max values of my_variable on the BM nodes, useful to set the limits for a colorbar

Return type:

MIN_VALUE, MAX_VALUE

draw_networkx(coloring_variable=None, color_palette=matplotlib.colormaps.get_cmap, colorbar=False, colorbar_label=None, ax=None, MIN_VALUE=inf, MAX_VALUE=-inf, MIN_SCALE=100, MAX_SCALE=600, pos=None, **kwargs)

Wrapper around the networkx.draw_networkx method with colorbar support.

Parameters:
  • coloring_variable (string, optional) – the variable to use for coloring the BM graph, by default None

  • color_palette (matplotlib.colors.Colormap, optional) – the coloring palette to use, by default cm.get_cmap(“Reds”)

  • colorbar (bool, optional) – the label on the colorbar’s long axis.

  • colorbar_label (str, optional) – whether to add a colorbar to the plot, by default False

  • ax (matplotlib.axes.Axes, optional) – the matplotlib ax where to plot the graph. If None, the current ax is used. By default None

  • MIN_VALUE (float, optional) – the value to be assigned to the lowest color in the cmap, by default np.inf

  • MAX_VALUE (float, optional) – the value to be assigned to the highest color in the cmap, by default -np.inf

  • MIN_SCALE (int, optional) – the minimum radius for the nodes, by default 100

  • MIN_SCALE – the maximum radius for the nodes, by default 100

  • pos (dictionary, optional) – A dictionary with nodes as keys and positions as values. If not specified a spring layout positioning will be computed. See networkx.drawing.layout for functions that compute node positions. By default None

Returns:

the matplotlib ax

Return type:

ax

filter_by(list_of_points)

return a copy of the BallMapper object with only the nodes covering a subset of points

Parameters:

list_of_points (list) – list of the subset of points to keep

Returns:

the filtered BallMapper graph

Return type:

BallMapper

points_and_balls()

returns a DataFrame with the points_covered_by_landmarks information

Return type:

pandas.DataFrame