pegasus.train_scarches_scanvi(data, dir_path, label, unlabeled_category='Unknown', features='highly_variable_features', matkey='raw.X', n_jobs=- 1, random_state=0, max_epochs=None, batch=None, categorical_covariate_keys=None, continuous_covariate_keys=None, semisupervised_max_epochs=None, n_samples_per_label=None, use_gpu=None, arches_params={'dropout_rate': 0.2, 'encode_covariates': True, 'n_layers': 2, 'use_batch_norm': 'none', 'use_layer_norm': 'both'})[source]

Run scArches training.

This is a wrapper of scvitools package.

  • data (MultimodalData.) – Annotated data matrix with rows for cells and columns for genes.

  • dir_path (str.) – Save the model to this directory.

  • label (str.) – The obs key representing labels.

  • unlabeled_category (str, default: "Unknown") – Value used for unlabeled cells in label.

  • features (str, optional, default: "highly_variable_features") – Keyword in data.var, which refers to a boolean array. If None, all features will be selected.

  • matkey (str, optional, default: "raw.X") – Matrix key for the raw count

  • n_jobs (int, optional, default: -1.) – Number of threads to use. -1 refers to using all physical CPU cores.

  • random_state (int, optional, default: 0.) – Seed for random number generator

  • max_epochs (int | None, optional, default: None.) – Maximum number of unsupervised training epochs. Defaults to np.min([round((20000 / n_cells) * 400), 400])

  • batch (str, optional, default: None.) – If only one categorical covariate, the obs key representing batches that should be corrected for, default is None.

  • categorical_covariate_keys (List[str]) – If multiple categorical covariates, a list of obs keys listing categorical covariates that should be corrected for, default is None.

  • continuous_covariate_keys (List[str]) – A list of obs keys listing continuous covariates that should be corrected for, default is None.

  • semisupervised_max_epochs (int | None, optional, default: None.) – Maximum number of semisupervised training epochs. Defaults to np.min([round(np.sqrt(max_epochs)), 20])

  • n_samples_per_label (int, optional, default: None.) – Number of subsamples for each label class to sample per epoch. By default, there is no label subsampling.

  • use_gpu (str | int | bool | None) – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).

  • arches_params (dict.) – Hyperparameters for VAE. See for more details


  • data.obsm['X_scVI']: The embedding calculated by scVI.

  • data.obsm['X_scanVI']: The embedding calculated by scanVI.

Return type

Update data.obsm


>>> pg.train_scarches_scanvi(data, dir_path="scanvi_model/", label="celltype", matkey="counts", batch="tech", n_samples_per_label=100)