pegasus.nmf

pegasus.nmf(data, n_components=20, features='highly_variable_features', space='log', init='nndsvdar', algo='halsvar', mode='batch', tol=0.0001, use_gpu=False, alpha_W=0.0, l1_ratio_W=0.0, alpha_H=0.0, l1_ratio_H=0.0, fp_precision='float', online_chunk_size=5000, n_jobs=-1, random_state=0)[source]

Perform Nonnegative Matrix Factorization (NMF) to the data using Frobenius norm. Steps include select features and L2 normalization and NMF and L2 normalization of resulting coordinates.

The calculation uses nmf-torch package.

Parameters
  • data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.

  • n_components (int, optional, default: 50.) – Number of Principal Components to get.

  • features (str, optional, default: "highly_variable_features".) – Keyword in data.var to specify features used for nmf.

  • max_value (float, optional, default: None.) – The threshold to truncate data symmetrically after scaling. If None, do not truncate.

  • space (str, optional, default: log.) – Choose from log and expression. log works on log-transformed expression space; expression works on the original expression space (normalized by total UMIs).

  • init (str, optional, default: nndsvdar.) – Method to initialize NMF. Options are ‘random’, ‘nndsvd’, ‘nndsvda’ and ‘nndsvdar’.

  • algo (str, optional, default: halsvar) – Choose from mu (Multiplicative Update), hals (Hierarchical Alternative Least Square), halsvar (HALS variant, use HALS to mimic bpp and can get better convergence for sometimes) and bpp (alternative non-negative least squares with Block Principal Pivoting method).

  • mode (str, optional, default: batch) – Learning mode. Choose from batch and online. Notice that online only works when beta=2.0. For other beta loss, it switches back to batch method.

  • tol (float, optional, default: 1e-4) – The toleration used for convergence check.

  • use_gpu (bool, optional, default: False) – If True, use GPU if available. Otherwise, use CPU only.

  • alpha_W (float, optional, default: 0.0) – A numeric scale factor which multiplies the regularization terms related to W. If zero or negative, no regularization regarding W is considered.

  • l1_ratio_W (float, optional, default: 0.0) – The ratio of L1 penalty on W, must be between 0 and 1. And thus the ratio of L2 penalty on W is (1 - l1_ratio_W).

  • alpha_H (float, optional, default: 0.0) – A numeric scale factor which multiplies the regularization terms related to H. If zero or negative, no regularization regarding H is considered.

  • l1_ratio_H (float, optional, default: 0.0) – The ratio of L1 penalty on W, must be between 0 and 1. And thus the ratio of L2 penalty on H is (1 - l1_ratio_H).

  • fp_precision (str, optional, default: float) – The numeric precision on the results. Choose from float and double.

  • online_chunk_size (int, optional, default: int) – The chunk / mini-batch size for online learning. Only works when mode='online'.

  • n_jobs (int, optional (default: -1)) – Number of threads to use. -1 refers to using all physical CPU cores.

  • random_state (int, optional, default: 0.) – Random seed to be set for reproducing result.

Return type

None

Returns

  • None.

  • Update data.obsm

    • data.obsm["X_nmf"]: Scaled NMF coordinates of shape (n_cells, n_components). Each column has a unit variance.

    • data.obsm["H"]: The coordinate factor matrix of shape (n_cells, n_components).

  • Update data.uns

    • data.uns["W"]: The feature factor matrix of shape (n_HVFs, n_components).

    • data.uns["nmf_err"]: The NMF loss.

    • data.uns["nmf_features"]: Record the features used to perform NMF analysis.

Examples

>>> pg.nmf(data)