pegasus.neighbors

pegasus.neighbors(data, K=100, rep='pca', n_jobs=- 1, random_state=0, full_speed=False, use_cache=True, dist='l2')[source]

Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.

The kNN calculation uses hnswlib introduced by [Malkov16].

Parameters
  • data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.

  • K (int, optional, default: 100) – Number of neighbors, including the data point itself.

  • rep (str, optional, default: "pca") – Embedding representation used to calculate kNN. If None, use data.X; otherwise, keyword 'X_' + rep must exist in data.obsm.

  • n_jobs (int, optional, default: -1) – Number of threads to use. If -1, use all physical CPU cores.

  • random_state (int, optional, default: 0) – Random seed set for reproducing results.

  • full_speed (bool, optional, default: False) –

    • If True, use multiple threads in constructing hnsw index. However, the kNN results are not reproducible.

    • Otherwise, use only one thread to make sure results are reproducible.

  • use_cache (bool, optional, default: True) –

    • If True and found cached knn results, Pegasus will use cached results and do not recompute.

    • Otherwise, compute kNN irrespective of caching status.

  • dist (str, optional (default: "l2")) – Distance metric to use. By default, use squared L2 distance. Available options, inner product "ip" or cosine similarity "cosine".

Return type

None

Returns

  • None

  • Update data.obsm

    • data.obsm[rep + "_knn_indices"]: kNN index matrix. Row i is the index list of kNN of cell i (excluding itself), sorted from nearest to farthest.

    • data.obsm[rep + "_knn_distances"]: kNN distance matrix. Row i is the distance list of kNN of cell i (excluding itselt), sorted from smallest to largest.

  • Update data.obsp

    • data.obsp["W_" + rep]: kNN graph of the data in terms of affinity matrix.

Examples

>>> pg.neighbors(data)