pegasus.get_neighbors

pegasus.get_neighbors(data, K=100, rep='pca', n_comps=None, n_jobs=-1, random_state=0, full_speed=False, use_cache=False, dist='l2', method='hnsw', exact_k=False)[source]

Find K nearest neighbors for each data point and return the indices and distances arrays.

K is determined by min(K, int(sqrt(data.shape[0]))) if exact_k == False.

Parameters
  • data (pegasusio.MultimodalData) – An AnnData object.

  • K (int, optional (default: 100)) – Number of neighbors, including the data point itself.

  • rep (str, optional (default: ‘pca’)) – Representation used to calculate kNN. If None use data.X

  • n_comps (int, optional (default: None)) – Number of components to be used in the rep. If n_comps == None, use all components; otherwise, use the minimum of n_comps and rep’s dimensions.

  • n_jobs (int, optional (default: -1)) – Number of threads to use. -1 refers to using all physical CPU cores.

  • random_state (int, optional (default: 0)) – Random seed for random number generator.

  • full_speed (bool, optional (default: False)) – If full_speed, use multiple threads in constructing hnsw index. However, the kNN results are not reproducible. If not full_speed, use only one thread to make sure results are reproducible.

  • use_cache (bool, optional (default: False)) – If use_cache and found cached knn results, will not recompute.

  • dist (str, optional (default: ‘l2’)) – Distance metric to use. By default, use squared L2 distance. Available options, ‘l2’ or inner product ‘ip’ or cosine similarity ‘cosine’.

  • method (str, optional (default: ‘hnsw’)) – Choosing from ‘hnsw’ for approximate nearest neighbor search or ‘sklearn’ for exact nearest neighbor search.

  • exact_k (bool, optional (default: ‘False’)) – If True, use exactly the K passed to the function; otherwise K is determined as min(K, sqrt(X.shape[0])).

Return type

Tuple[List[int], List[float], int]

Returns

  • kNN indices array, distances array, and adjusted K.

Examples

>>> indices, distances, K = tools.get_neighbors(data)