pegasus.train_scarches_scanvi
- pegasus.train_scarches_scanvi(data, dir_path, label, unlabeled_category='Unknown', features='highly_variable_features', matkey='counts', n_jobs=-1, random_state=0, max_epochs=None, batch=None, categorical_covariate_keys=None, continuous_covariate_keys=None, semisupervised_max_epochs=None, n_samples_per_label=None, use_gpu=None, arches_params={'dropout_rate': 0.2, 'encode_covariates': True, 'n_layers': 2, 'use_batch_norm': 'none', 'use_layer_norm': 'both'})[source]
Run scArches training.
This is a wrapper of scvitools package.
- Parameters
data (
MultimodalData
.) – Annotated data matrix with rows for cells and columns for genes.dir_path (
str
.) – Save the model to this directory.label (
str
.) – The obs key representing labels.unlabeled_category (
str
, default:"Unknown"
) – Value used for unlabeled cells inlabel
.features (
str
, optional, default:"highly_variable_features"
) – Keyword indata.var
, which refers to a boolean array. IfNone
, all features will be selected.matkey (
str
, optional, default:"counts"
) – Matrix key for the raw countn_jobs (
int
, optional, default:-1
.) – Number of threads to use.-1
refers to using all physical CPU cores.random_state (
int
, optional, default:0
.) – Seed for random number generatormax_epochs (
int | None
, optional, default:None
.) – Maximum number of unsupervised training epochs. Defaults to np.min([round((20000 / n_cells) * 400), 400])batch (
str
, optional, default:None
.) – If only one categorical covariate, the obs key representing batches that should be corrected for, default isNone
.categorical_covariate_keys (
List[str]
) – If multiple categorical covariates, a list of obs keys listing categorical covariates that should be corrected for, default isNone
.continuous_covariate_keys (
List[str]
) – A list of obs keys listing continuous covariates that should be corrected for, default isNone
.semisupervised_max_epochs (
int | None
, optional, default:None
.) – Maximum number of semisupervised training epochs. Defaults to np.min([round(np.sqrt(max_epochs
)), 20])n_samples_per_label (
int
, optional, default:None
.) – Number of subsamples for each label class to sample per epoch. By default, there is no label subsampling.use_gpu (
str | int | bool | None
) – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).arches_params (
dict
.) – Hyperparameters for VAE. See https://docs.scvi-tools.org/en/stable/api/reference/scvi.module.VAE.html#scvi.module.VAE for more details
- Returns
data.obsm['X_scVI']
: The embedding calculated by scVI.data.obsm['X_scanVI']
: The embedding calculated by scanVI.
- Return type
Update
data.obsm
Examples
>>> pg.train_scarches_scanvi(data, dir_path="scanvi_model/", label="celltype", matkey="counts", batch="tech", n_samples_per_label=100)