pegasus.cluster¶

pegasus.cluster(data, algo='louvain', rep='pca', resolution=1.3, n_jobs=- 1, random_state=0, class_label=None, n_iter=- 1, rep_kmeans='diffmap', n_clusters=30, n_clusters2=50, n_init=10)[source]¶

Cluster the data using the chosen algorithm.

Candidates are louvain, leiden, spectral_louvain and spectral_leiden. If data have < 1000 cells and there are clusters with sizes of 1, resolution is automatically reduced until no cluster of size 1 appears.

Parameters

data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.
algo (str, optional, default: "louvain") – Which clustering algorithm to use. Choices are louvain, leiden, spectral_louvain, spectral_leiden
rep (str, optional, default: "pca") – The embedding representation used for clustering. Keyword 'X_' + rep must exist in data.obsm. By default, use PCA coordinates.
resolution (int, optional, default: 1.3) – Resolution factor. Higher resolution tends to find more clusters.
n_jobs (int, optional (default: -1)) – Number of threads to use for the KMeans step in ‘spectral_louvain’ and ‘spectral_leiden’. -1 refers to using all physical CPU cores.
random_state (int, optional, default: 0) – Random seed for reproducing results.
class_label (str, optional, default: None) – Key name for storing cluster labels in data.obs. If None, use ‘algo_labels’.
n_iter (int, optional, default: -1) – Number of iterations that Leiden algorithm runs. If -1, run the algorithm until reaching its optimal clustering.
rep_kmeans (str, optional, default: "diffmap") – The embedding representation on which the KMeans runs. Keyword must exist in data.obsm. By default, use Diffusion Map coordinates. If diffmap is not calculated, use PCA coordinates instead.
n_clusters (int, optional, default: 30) – The number of first level clusters.
n_clusters2 (int, optional, default: 50) – The number of second level clusters.
n_init (int, optional, default: 10) – Number of kmeans tries for the first level clustering. Default is set to be the same as scikit-learn Kmeans function.

Return type

None

Returns

None
Update data.obs –
- data.obs[class_label]: Cluster labels of cells as categorical data.

Examples

>>> pg.cluster(data, algo = 'leiden')