pegasus.dendrogram

pegasus.dendrogram(data, groupby, rep='pca', genes=None, correlation_method='pearson', n_clusters=None, affinity='euclidean', linkage='complete', compute_full_tree='auto', distance_threshold=0, panel_size=(6, 6), orientation='top', color_threshold=None, return_fig=False, dpi=300.0, **kwargs)[source]

Generate a dendrogram on hierarchical clustering result.

The metrics used here are consistent with SCANPY’s dendrogram implementation.

scikit-learn Agglomerative Clustering implementation is used for hierarchical clustering.

Parameters
  • data (MultimodalData, UnimodalData, or AnnData object) – Single cell expression data.

  • genes (List[str], optional, default: None) – List of genes to use. Gene names must exist in data.var. If set, use the counts in data.X for plotting; if set as None, use the embedding specified in rep for plotting.

  • rep (str, optional, default: pca) – Cell embedding to use. It only works when genes``is ``None, and its key "X_"+rep must exist in data.obsm. By default, use PCA coordinates.

  • groupby (str) – Categorical cell attribute to plot, which must exist in data.obs.

  • correlation_method (str, optional, default: pearson) – Method of correlation between categories specified in data.obs. Available options are: pearson, kendall, spearman. See pandas corr documentation for details.

  • n_clusters (int, optional, default: None) – The number of clusters to find, used by hierarchical clustering. It must be None if distance_threshold is not None.

  • affinity (str, optional, default: correlation) –

    Metric used to compute the linkage, used by hierarchical clustering. Valid values for metric are:
    • From scikit-learn: cityblock, cosine, euclidean, l1, l2, manhattan.

    • From scipy.spatial.distance: braycurtis, canberra, chebyshev, correlation, dice, hamming, jaccard, kulsinski, mahalanobis, minkowski, rogerstanimoto, russellrao, seuclidean, sokalmichener, sokalsneath, sqeuclidean, yule.

    Default is the correlation distance. See scikit-learn distance documentation for details.

  • linkage (str, optional, default: complete) –

    Which linkage criterion to use, used by hierarchical clustering. Below are available options:
    • ward minimizes the variance of the clusters being merged.

    • average uses the average of the distances of each observation of the two sets.

    • complete uses the maximum distances between all observations of the two sets. (Default)

    • single uses the minimum of the distances between all observations of the two sets.

    See scikit-learn documentation for details.

  • compute_full_tree (str or bool, optional, default: auto) – Stop early the construction of the tree at n_clusters, used by hierarchical clustering. It must be True if distance_threshold is not None. By default, this option is auto, which is True if and only if distance_threshold is not None, or n_clusters is less than min(100, 0.02 * n_groups), where n_groups is the number of categories in data.obs[groupby].

  • distance_threshold (float, optional, default: 0) – The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True.

  • panel_size (Tuple[float, float], optional, default: (6, 6)) – The size (width, height) in inches of figure.

  • orientation (str, optional, default: top) – The direction to plot the dendrogram. Available options are: top, bottom, left, right. See scipy dendrogram documentation for explanation.

  • color_threshold (float, optional, default: None) – Threshold for coloring clusters. See scipy dendrogram documentation for explanation.

  • return_fig (bool, optional, default: False) – Return a Figure object if True; return None otherwise.

  • dpi (float, optional, default: 300.0) – The resolution in dots per inch.

  • **kwargs – Are passed to scipy.cluster.hierarchy.dendrogram.

Return type

Optional[Figure]

Returns

  • Figure object – A matplotlib.figure.Figure object containing the dot plot if return_fig == True

Examples

>>> pg.dendrogram(data, genes=data.var_names, groupby='louvain_labels')
>>> pg.dendrogram(data, rep='pca', groupby='louvain_labels')