pegasus.neighbors
- pegasus.neighbors(data, K=100, rep='pca', n_comps=None, n_jobs=-1, random_state=0, full_speed=False, use_cache=False, dist='l2', method='hnsw', exact_k=False)[source]
Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.
The kNN calculation uses hnswlib introduced by [Malkov16].
K is determined by min(K, sqrt(data.shape[0])).
- Parameters
data (
pegasusio.MultimodalData
) – Annotated data matrix with rows for cells and columns for genes.K (
int
, optional, default:100
) – Number of neighbors, including the data point itself.rep (
str
, optional, default:"pca"
) – Embedding representation used to calculate kNN. IfNone
, usedata.X
; otherwise, keyword'X_' + rep
must exist indata.obsm
.n_comps (int, optional (default: None)) – Number of components to be used in the rep. If n_comps == None, use all components; otherwise, use the minimum of n_comps and rep’s dimensions.
n_jobs (
int
, optional, default:-1
) – Number of threads to use. If-1
, use all physical CPU cores.random_state (
int
, optional, default:0
) – Random seed set for reproducing results.full_speed (
bool
, optional, default:False
) –If
True
, use multiple threads in constructinghnsw
index. However, the kNN results are not reproducible.Otherwise, use only one thread to make sure results are reproducible.
use_cache (
bool
, optional, default:False
) –If
True
and found cached knn results, Pegasus will use cached results and do not recompute.Otherwise, compute kNN irrespective of caching status.
dist (
str
, optional (default:"l2"
)) – Distance metric to use. By default, use squared L2 distance. Available options,"l2"
or inner product"ip"
or cosine similarity"cosine"
.method (
str
, optional (default:"hnsw"
)) – Choose from “hnsw” or “sklearn”. “hnsw” uses HNSW algorithm for approximate nearest neighbor search and “sklearn” uses sklearn package for exact nearest neighbor search.exact_k (
bool
, optional (default:False
)) – If True, use exactly the K passed to the function; otherwise K is determined as min(K, sqrt(X.shape[0])).
- Return type
None
- Returns
None
Update
data.obsm
–data.obsm[rep + "_knn_indices"]
: kNN index matrix. Row i is the index list of kNN of cell i (excluding itself), sorted from nearest to farthest.data.obsm[rep + "_knn_distances"]
: kNN distance matrix. Row i is the distance list of kNN of cell i (excluding itselt), sorted from smallest to largest.
Update
data.obsp
–data.obsp["W_" + rep]
: kNN graph of the data in terms of affinity matrix.
Examples
>>> pg.neighbors(data)