pegasus.calc_dendrogram

pegasus.calc_dendrogram(data, groupby='obs', rep='pca', genes=None, on_average=True, linkage_method='ward', res_key='dendrogram')[source]

Cluster data using hierarchical clustering algorithm.

The metric in use is a Connection Specific Index (CSI) matrix ([Suo18], [Bass13]) built from the correlations between groupby attribute levels regarding the rep embedding.

Parameters
  • data (MultimodalData, UnimodalData, or AnnData object) – Single cell expression data.

  • groupby (str, optional, default: None) – Set cluster labels in use. If "obs", use cell names (i.e. data.obs_names); if "var", use feature names (i.e. data.var_names). Otherwise, specify a categorical cell or feature attribute to use, which must exist in data.obs or data.var.

  • rep (str, optional, default: pca) – Cell embedding to use. If specified, it only works when genes is None, and its key "X_"+rep must exist in data.obsm. By default, use PCA embedding. If None, use the current count matrix data.X.

  • genes (List[str], optional, default: None) – List of genes to use. Gene names must exist in data.var. If set, use the counts in data.X for plotting; if None, use the embedding specified in rep.

  • on_average (bool, optional, default: True) – If True, clustering groupby levels based on their mean values. Only works when groupby is not None.

  • linkage_method (str, optional, default: ward) – Which linkage criterion to use, used by hierarchical clustering. Available options: ward (default), single, complete, average, weighted, centroid, median. See scipy linkage documentation for details.

  • res_key (str, optional, default: dendrogram) – Key name in data.uns field to store the calculated dendrogram information, which will be used by plot_dendrogram function for plotting.

Return type

None

Returns

  • None

  • Update data.uns

    • data.uns[res_key]: A tuple of the calculated linkage matrix and its corresponding labels.

Examples

>>> pg.calc_dendrogram(data, groupby='leiden_labels')
>>> pg.calc_dendrogram(data, genes=['CD4', 'CD8A', 'CD8B'], on_average=False)
>>> pg.calc_dendrogram(data, groupby="var", rep=None, on_average=False)