pegasus.cluster

pegasus.cluster(data, algo='louvain', rep='pca', resolution=1.3, n_jobs=-1, random_state=0, class_label=None, n_iter=-1, rep_kmeans='diffmap', n_clusters=30, n_clusters2=50, n_init=10)[source]

Cluster the data using the chosen algorithm.

Candidates are louvain, leiden, spectral_louvain and spectral_leiden. If data have < 1000 cells and there are clusters with sizes of 1, resolution is automatically reduced until no cluster of size 1 appears.

Parameters
  • data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.

  • algo (str, optional, default: "louvain") – Which clustering algorithm to use. Choices are louvain, leiden, spectral_louvain, spectral_leiden

  • rep (str, optional, default: "pca") – The embedding representation used for clustering. Keyword 'X_' + rep must exist in data.obsm. By default, use PCA coordinates.

  • resolution (int, optional, default: 1.3) – Resolution factor. Higher resolution tends to find more clusters.

  • n_jobs (int, optional (default: -1)) – Number of threads to use for the KMeans step in ‘spectral_louvain’ and ‘spectral_leiden’. -1 refers to using all physical CPU cores.

  • random_state (int, optional, default: 0) – Random seed for reproducing results.

  • class_label (str, optional, default: None) – Key name for storing cluster labels in data.obs. If None, use ‘algo_labels’.

  • n_iter (int, optional, default: -1) – Number of iterations that Leiden algorithm runs. If -1, run the algorithm until reaching its optimal clustering.

  • rep_kmeans (str, optional, default: "diffmap") – The embedding representation on which the KMeans runs. Keyword must exist in data.obsm. By default, use Diffusion Map coordinates. If diffmap is not calculated, use PCA coordinates instead.

  • n_clusters (int, optional, default: 30) – The number of first level clusters.

  • n_clusters2 (int, optional, default: 50) – The number of second level clusters.

  • n_init (int, optional, default: 10) – Number of kmeans tries for the first level clustering. Default is set to be the same as scikit-learn Kmeans function.

Return type

None

Returns

  • None

  • Update data.obs

    • data.obs[class_label]: Cluster labels of cells as categorical data.

Examples

>>> pg.cluster(data, algo = 'leiden')