pegasus.integrative_nmf

pegasus.integrative_nmf(data, batch='Channel', n_components=20, features='highly_variable_features', space='log', algo='halsvar', mode='online', tol=0.0001, use_gpu=False, lam=5.0, fp_precision='float', online_chunk_size=5000, n_jobs=-1, random_state=0, quantile_norm=True)[source]

Perform Integrative Nonnegative Matrix Factorization (iNMF) [Yang16] for data integration.

The calculation uses nmf-torch .

This function assumes that cells in each batch are adjacent to each other. In addition, it will scale each batch with L2 norm separately. The resulting Hs will also be scaled with L2 norm. If quantile_norm=True, quantile normalization will be additionally performed.

See [Welch19] and [Gao21] for preprocessing and normalization details.

Parameters

data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.
batch (str, optional, default: "Channel".) – Which attribute in data.obs field represents batches, default is “Channel”.
n_components (int, optional, default: 50.) – Number of Principal Components to get.
features (str, optional, default: "highly_variable_features".) – Keyword in data.var to specify features used for integrative_nmf.
space (str, optional, default: log.) – Choose from log and expression. log works on log-transformed expression space; expression works on the original expression space (normalized by total UMIs).
algo (str, optional, default: halsvar) – Choose from mu (Multiplicative Update), halsvar (HALS variant that mimic bpp but faster) and bpp (alternative non-negative least squares with Block Principal Pivoting method).
mode (str, optional, default: online) – Learning mode. Choose from batch and online. Notice that online only works when beta=2.0. For other beta loss, it switches back to batch method.
tol (float, optional, default: 1e-4) – The toleration used for convergence check.
use_gpu (bool, optional, default: False) – If True, use GPU if available. Otherwise, use CPU only.
lam (float, optional, default: 5.0) – The coefficient for regularization terms. If 0, then no regularization will be performed.
fp_precision (str, optional, default: float) – The numeric precision on the results. Choose from float and double.
online_chunk_size (int, optional, default: 5000) – The chunk / mini-batch size for online learning. Only works when mode='online'.
n_jobs (int, optional (default: -1)) – Number of threads to use. -1 refers to using all physical CPU cores.
random_state (int, optional, default: 0.) – Random seed to be set for reproducing result.
quantile_norm (bool, optioanl, default: True.) – Perform quantile normalization as described in Gao et al. Nature Biotech 2021. Cluster refinement K=20; min_cells=20; quantiles = 50.

Returns

out_rep – The keyword in data.obsm referring to the embedding calculated by integrative NMF algorithm. out_rep is always equal to “inmf”

Return type

str

Update data.obsm:

data.obsm["X_inmf"]: Scaled and possibly quantile normalized iNMF coordinates.

data.obsm["H"]: The concatenation of coordinate factor matrices of shape (n_cells, n_components).

Update data.uns:

data.uns["W"]: The feature factor matrix of shape (n_HVFs, n_components).

data.uns["V"]: The batch specific feature factor matrices as one tensor of shape (n_batches, n_components, n_HVFs).

data.uns["inmf_err"]: The iNMF loss.

data.uns["inmf_features"]: Record the features used to perform iNMF analysis.

Examples

>>> pg.integrative_nmf(data)