pegasus.dendrogram¶

pegasus.dendrogram(data, groupby, rep='pca', genes=None, correlation_method='pearson', n_clusters=None, affinity='euclidean', linkage='complete', compute_full_tree='auto', distance_threshold=0, panel_size=(6, 6), orientation='top', color_threshold=None, return_fig=False, dpi=300.0, **kwargs)[source]¶

Generate a dendrogram on hierarchical clustering result.

The metrics used here are consistent with SCANPY’s dendrogram implementation.

scikit-learn Agglomerative Clustering implementation is used for hierarchical clustering.

Parameters

data (MultimodalData, UnimodalData, or AnnData object) – Single cell expression data.
genes (List[str], optional, default: None) – List of genes to use. Gene names must exist in data.var. If set, use the counts in data.X for plotting; if set as None, use the embedding specified in rep for plotting.
rep (str, optional, default: pca) – Cell embedding to use. It only works when genes``is ``None, and its key "X_"+rep must exist in data.obsm. By default, use PCA coordinates.
groupby (str) – Categorical cell attribute to plot, which must exist in data.obs.
correlation_method (str, optional, default: pearson) – Method of correlation between categories specified in data.obs. Available options are: pearson, kendall, spearman. See pandas corr documentation for details.
n_clusters (int, optional, default: None) – The number of clusters to find, used by hierarchical clustering. It must be None if distance_threshold is not None.
affinity (str, optional, default: correlation) –
Metric used to compute the linkage, used by hierarchical clustering. Valid values for metric are:
- From scikit-learn: cityblock, cosine, euclidean, l1, l2, manhattan.
- From scipy.spatial.distance: braycurtis, canberra, chebyshev, correlation, dice, hamming, jaccard, kulsinski, mahalanobis, minkowski, rogerstanimoto, russellrao, seuclidean, sokalmichener, sokalsneath, sqeuclidean, yule.
Default is the correlation distance. See scikit-learn distance documentation for details.
linkage (str, optional, default: complete) –
Which linkage criterion to use, used by hierarchical clustering. Below are available options:
- ward minimizes the variance of the clusters being merged.
- avarage uses the average of the distances of each observation of the two sets.
- complete uses the maximum distances between all observations of the two sets. (Default)
- single uses the minimum of the distances between all observations of the two sets.
See scikit-learn documentation for details.
compute_full_tree (str or bool, optional, default: auto) – Stop early the construction of the tree at n_clusters, used by hierarchical clustering. It must be True if distance_threshold is not None. By default, this option is auto, which is True if and only if distance_threshold is not None, or n_clusters is less than min(100, 0.02 * n_groups), where n_groups is the number of categories in data.obs[groupby].
distance_threshold (float, optional, default: 0) – The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True.
panel_size (Tuple[float, float], optional, default: (6, 6)) – The size (width, height) in inches of figure.
orientation (str, optional, default: top) – The direction to plot the dendrogram. Available options are: top, bottom, left, right. See scipy dendrogram documentation for explanation.
color_threshold (float, optional, default: None) – Threshold for coloring clusters. See scipy dendrogram documentation for explanation.
return_fig (bool, optional, default: False) – Return a Figure object if True; return None otherwise.
dpi (float, optional, default: 300.0) – The resolution in dots per inch.
**kwargs – Are passed to scipy.cluster.hierarchy.dendrogram.

Returns

A matplotlib.figure.Figure object containing the dot plot if return_fig == True

Return type

Figure object

Examples

>>> pg.dendrogram(data, genes=data.var_names, groupby='louvain_labels')
>>> pg.dendrogram(data, rep='pca', groupby='louvain_labels')