pegasus.integrative_nmf
- pegasus.integrative_nmf(data, batch='Channel', n_components=20, features='highly_variable_features', space='log', algo='halsvar', mode='online', tol=0.0001, use_gpu=False, lam=5.0, fp_precision='float', online_chunk_size=5000, n_jobs=-1, random_state=0, quantile_norm=True)[source]
Perform Integrative Nonnegative Matrix Factorization (iNMF) [Yang16] for data integration.
The calculation uses nmf-torch .
This function assumes that cells in each batch are adjacent to each other. In addition, it will scale each batch with L2 norm separately. The resulting Hs will also be scaled with L2 norm. If
quantile_norm=True
, quantile normalization will be additionally performed.See [Welch19] and [Gao21] for preprocessing and normalization details.
- Parameters
data (
pegasusio.MultimodalData
) – Annotated data matrix with rows for cells and columns for genes.batch (
str
, optional, default:"Channel"
.) – Which attribute in data.obs field represents batches, default is “Channel”.n_components (
int
, optional, default:50
.) – Number of Principal Components to get.features (
str
, optional, default:"highly_variable_features"
.) – Keyword indata.var
to specify features used for integrative_nmf.space (
str
, optional, default:log
.) – Choose fromlog
andexpression
.log
works on log-transformed expression space;expression
works on the original expression space (normalized by total UMIs).algo (
str
, optional, default:halsvar
) – Choose frommu
(Multiplicative Update),halsvar
(HALS variant that mimic bpp but faster) andbpp
(alternative non-negative least squares with Block Principal Pivoting method).mode (
str
, optional, default:online
) – Learning mode. Choose frombatch
andonline
. Notice thatonline
only works whenbeta=2.0
. For other beta loss, it switches back tobatch
method.tol (
float
, optional, default:1e-4
) – The toleration used for convergence check.use_gpu (
bool
, optional, default:False
) – IfTrue
, use GPU if available. Otherwise, use CPU only.lam (
float
, optional, default:5.0
) – The coefficient for regularization terms. If0
, then no regularization will be performed.fp_precision (
str
, optional, default:float
) – The numeric precision on the results. Choose fromfloat
anddouble
.online_chunk_size (
int
, optional, default:5000
) – The chunk / mini-batch size for online learning. Only works whenmode='online'
.n_jobs (int, optional (default: -1)) – Number of threads to use. -1 refers to using all physical CPU cores.
random_state (
int
, optional, default:0
.) – Random seed to be set for reproducing result.quantile_norm (
bool
, optioanl, default:True
.) – Perform quantile normalization as described in Gao et al. Nature Biotech 2021. Cluster refinement K=20; min_cells=20; quantiles = 50.
- Returns
out_rep – The keyword in
data.obsm
referring to the embedding calculated by integrative NMF algorithm. out_rep is always equal to “inmf”- Return type
str
Update
data.obsm
:data.obsm["X_inmf"]
: Scaled and possibly quantile normalized iNMF coordinates.data.obsm["H"]
: The concatenation of coordinate factor matrices of shape(n_cells, n_components)
.
Update
data.uns
:data.uns["W"]
: The feature factor matrix of shape(n_HVFs, n_components)
.data.uns["V"]
: The batch specific feature factor matrices as one tensor of shape(n_batches, n_components, n_HVFs)
.data.uns["inmf_err"]
: The iNMF loss.data.uns["inmf_features"]
: Record the features used to perform iNMF analysis.
Examples
>>> pg.integrative_nmf(data)