pegasus.net_umap¶

pegasus.net_umap(data, rep='pca', n_jobs=- 1, n_components=2, n_neighbors=15, min_dist=0.5, spread=1.0, random_state=0, select_frac=0.1, select_K=25, select_alpha=1.0, full_speed=False, net_alpha=0.1, polish_learning_rate=10.0, polish_n_epochs=30, out_basis='net_umap')[source]¶

Calculate approximated UMAP embedding using Deep Learning model to improve the speed.

In specific, the deep model used is MLPRegressor, the scikit-learn implementation of Multi-layer Perceptron regressor.

Parameters

data (anndata.AnnData) – Annotated data matrix with rows for cells and columns for genes.
rep (str, optional, default: "pca") – Representation of data used for the calculation. By default, use PCA coordinates. If None, use the count matrix data.X.
n_components (int, optional, default: 2) – Dimension of calculated UMAP coordinates. By default, generate 2-dimensional data for 2D visualization.
n_neighbors (int, optional, default: 15) – Number of nearest neighbors considered during the computation.
min_dist (float, optional, default: 0.5) – The effective minimum distance between embedded data points.
spread (float, optional, default: 1.0) – The effective scale of embedded data points.
random_state (int, optional, default: 0) – Random seed set for reproducing results.
select_frac (float, optional, default: 0.1) – Down sampling fraction on the cells.
select_K (int, optional, default: 25) – Number of neighbors to be used to estimate local density for each data point for down sampling.
select_alpha (float, optional, default: 1.0) – Weight the down sample to be proportional to radius ** select_alpha.
full_speed (bool, optional, default: False) –
- If True, use multiple threads in constructing hnsw index. However, the kNN results are not reproducible.
- Otherwise, use only one thread to make sure results are reproducible.
net_alpha (float, optional, default: 0.1) – L2 penalty (regularization term) parameter of the deep regressor.
polish_learning_frac (float, optional, default: 10.0) – After running the deep regressor to predict new coordinates, use polish_learning_frac * n_obs as the learning rate to polish the coordinates.
polish_n_iter (int, optional, default: 30) – Number of iterations for polishing UMAP run.
out_basis (str, optional, default: "net_umap") – Key name for calculated UMAP coordinates to store.

Return type

None

Returns

None
Update data.obsm –
- data.obsm['X_' + out_basis]: Net UMAP coordinates of the data.
Update data.obs –
- data.obs['ds_selected']: Boolean array to indicate which cells are selected during the down sampling phase.

Examples

>>> pg.net_umap(adata)