pegasus.cluster¶
- pegasus.cluster(data, algo='louvain', rep='pca', resolution=1.3, n_jobs=- 1, random_state=0, class_label=None, n_iter=- 1, rep_kmeans='diffmap', n_clusters=30, n_clusters2=50, n_init=10)[source]¶
Cluster the data using the chosen algorithm.
Candidates are louvain, leiden, spectral_louvain and spectral_leiden. If data have < 1000 cells and there are clusters with sizes of 1, resolution is automatically reduced until no cluster of size 1 appears.
- Parameters
data (
pegasusio.MultimodalData
) – Annotated data matrix with rows for cells and columns for genes.algo (
str
, optional, default:"louvain"
) – Which clustering algorithm to use. Choices are louvain, leiden, spectral_louvain, spectral_leidenrep (
str
, optional, default:"pca"
) – The embedding representation used for clustering. Keyword'X_' + rep
must exist indata.obsm
. By default, use PCA coordinates.resolution (
int
, optional, default:1.3
) – Resolution factor. Higher resolution tends to find more clusters.n_jobs (int, optional (default: -1)) – Number of threads to use for the KMeans step in ‘spectral_louvain’ and ‘spectral_leiden’. -1 refers to using all physical CPU cores.
random_state (
int
, optional, default:0
) – Random seed for reproducing results.class_label (
str
, optional, default: None) – Key name for storing cluster labels indata.obs
. If None, use ‘algo_labels’.n_iter (
int
, optional, default:-1
) – Number of iterations that Leiden algorithm runs. If-1
, run the algorithm until reaching its optimal clustering.rep_kmeans (
str
, optional, default:"diffmap"
) – The embedding representation on which the KMeans runs. Keyword must exist indata.obsm
. By default, use Diffusion Map coordinates. If diffmap is not calculated, use PCA coordinates instead.n_clusters (
int
, optional, default:30
) – The number of first level clusters.n_clusters2 (
int
, optional, default:50
) – The number of second level clusters.n_init (
int
, optional, default:10
) – Number of kmeans tries for the first level clustering. Default is set to be the same as scikit-learn Kmeans function.
- Return type
None
- Returns
None
Update
data.obs
–data.obs[class_label]
: Cluster labels of cells as categorical data.
Examples
>>> pg.cluster(data, algo = 'leiden')