pegasus.net_umap
- pegasus.net_umap(data, rep='pca', n_jobs=-1, n_components=2, n_neighbors=15, min_dist=0.5, spread=1.0, densmap=False, dens_lambda=2.0, dens_frac=0.3, dens_var_shift=0.1, random_state=0, select_frac=0.1, select_K=25, select_alpha=1.0, full_speed=False, use_cache=True, net_alpha=0.1, polish_learning_rate=10.0, polish_n_epochs=30, out_basis='net_umap')[source]
Calculate Net-UMAP embedding of cells.
Net-UMAP is an approximated UMAP embedding using Deep Learning model to improve the speed.
In specific, the deep model used is MLPRegressor, the scikit-learn implementation of Multi-layer Perceptron regressor.
See [Li20] for details.
- Parameters
data (
pegasusio.MultimodalData
) – Annotated data matrix with rows for cells and columns for genes.rep (
str
, optional, default:"pca"
) – Representation of data used for the calculation. By default, use PCA coordinates. IfNone
, use the count matrixdata.X
.n_jobs (
int
, optional, default:-1
) – Number of threads to use. If-1
, use all physical CPU cores.n_components (
int
, optional, default:2
) – Dimension of calculated UMAP coordinates. By default, generate 2-dimensional data for 2D visualization.n_neighbors (
int
, optional, default:15
) – Number of nearest neighbors considered during the computation.min_dist (
float
, optional, default:0.5
) – The effective minimum distance between embedded data points.spread (
float
, optional, default:1.0
) – The effective scale of embedded data points.densmap (
bool
, optional, default:False
) – Whether the density-augmented objective of densMAP should be used for optimization, which will generate an embedding where local densities are encouraged to be correlated with those in the original space.dens_lambda (
float
, optional, default:2.0
) – Controls the regularization weight of the density correlation term in densMAP. Only works when densmap isTrue
. Larger values prioritize density preservation over the UMAP objective, while values closer to 0 for the opposite direction. Notice that setting this parameter to0
is equivalent to running the original UMAP algorithm.dens_frac (
float
, optional, default:0.3
) – Controls the fraction of epochs (between 0 and 1) where the density-augmented objective is used in densMAP. Only works when densmap isTrue
. The first(1 - dens_frac)
fraction of epochs optimize the original UMAP objective before introducing the density correlation term.dens_var_shift (
float
, optional, default,0.1
) – A small constant added to the variance of local radii in the embedding when calculating the density correlation objective to prevent numerical instability from dividing by a small number. Only works when densmap isTrue
.random_state (
int
, optional, default:0
) – Random seed set for reproducing results.select_frac (
float
, optional, default:0.1
) – Down sampling fraction on the cells.select_K (
int
, optional, default:25
) – Number of neighbors to be used to estimate local density for each data point for down sampling.select_alpha (
float
, optional, default:1.0
) – Weight the down sample to be proportional toradius ** select_alpha
.full_speed (
bool
, optional, default:False
) –If
True
, use multiple threads in constructinghnsw
index. However, the kNN results are not reproducible.Otherwise, use only one thread to make sure results are reproducible.
use_cache (
bool
, optional, default:True
) – If use_cache and found cached knn results, will not recompute.net_alpha (
float
, optional, default:0.1
) – L2 penalty (regularization term) parameter of the deep regressor.polish_learning_frac (
float
, optional, default:10.0
) – After running the deep regressor to predict new coordinates, usepolish_learning_frac
*n_obs
as the learning rate to polish the coordinates.polish_n_iter (
int
, optional, default:30
) – Number of iterations for polishing UMAP run.out_basis (
str
, optional, default:"net_umap"
) – Key name for calculated UMAP coordinates to store.
- Return type
None
- Returns
None
Update
data.obsm
–data.obsm['X_' + out_basis]
: Net UMAP coordinates of the data.
Update
data.obs
–data.obs['ds_selected']
: Boolean array to indicate which cells are selected during the down sampling phase.
Examples
>>> pg.net_umap(data)