pegasus.train_scarches_scanvi

pegasus.train_scarches_scanvi(data, dir_path, label, unlabeled_category='Unknown', features='highly_variable_features', matkey='counts', n_jobs=-1, random_state=0, max_epochs=None, batch=None, categorical_covariate_keys=None, continuous_covariate_keys=None, semisupervised_max_epochs=None, n_samples_per_label=None, use_gpu=None, arches_params={'dropout_rate': 0.2, 'encode_covariates': True, 'n_layers': 2, 'use_batch_norm': 'none', 'use_layer_norm': 'both'})[source]

Run scArches training.

This is a wrapper of scvitools package.

Parameters

data (MultimodalData.) – Annotated data matrix with rows for cells and columns for genes.
dir_path (str.) – Save the model to this directory.
label (str.) – The obs key representing labels.
unlabeled_category (str, default: "Unknown") – Value used for unlabeled cells in label.
features (str, optional, default: "highly_variable_features") – Keyword in data.var, which refers to a boolean array. If None, all features will be selected.
matkey (str, optional, default: "counts") – Matrix key for the raw count
n_jobs (int, optional, default: -1.) – Number of threads to use. -1 refers to using all physical CPU cores.
random_state (int, optional, default: 0.) – Seed for random number generator
max_epochs (int | None, optional, default: None.) – Maximum number of unsupervised training epochs. Defaults to np.min([round((20000 / n_cells) * 400), 400])
batch (str, optional, default: None.) – If only one categorical covariate, the obs key representing batches that should be corrected for, default is None.
categorical_covariate_keys (List[str]) – If multiple categorical covariates, a list of obs keys listing categorical covariates that should be corrected for, default is None.
continuous_covariate_keys (List[str]) – A list of obs keys listing continuous covariates that should be corrected for, default is None.
semisupervised_max_epochs (int | None, optional, default: None.) – Maximum number of semisupervised training epochs. Defaults to np.min([round(np.sqrt(max_epochs)), 20])
n_samples_per_label (int, optional, default: None.) – Number of subsamples for each label class to sample per epoch. By default, there is no label subsampling.
use_gpu (str | int | bool | None) – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).
arches_params (dict.) – Hyperparameters for VAE. See https://docs.scvi-tools.org/en/stable/api/reference/scvi.module.VAE.html#scvi.module.VAE for more details

Returns

data.obsm['X_scVI']: The embedding calculated by scVI.
data.obsm['X_scanVI']: The embedding calculated by scanVI.

Return type

Update data.obsm

Examples

>>> pg.train_scarches_scanvi(data, dir_path="scanvi_model/", label="celltype", matkey="counts", batch="tech", n_samples_per_label=100)