pegasus.run_harmony

pegasus.run_harmony(data, batch='Channel', rep='pca', n_comps=None, n_jobs=-1, n_clusters=None, random_state=0, use_gpu=False, max_iter_harmony=10)[source]

Batch correction on PCs using Harmony.

This is a wrapper of harmony-pytorch package, which is a Pytorch implementation of Harmony algorithm [Korsunsky19].

Parameters
  • data (MultimodalData.) – Annotated data matrix with rows for cells and columns for genes.

  • batch (str or List[str], optional, default: "Channel".) – Which attribute in data.obs field represents batches, default is “Channel”. If using multiple attributes, specify their names in a list.

  • rep (str, optional, default: "pca".) – Which representation to use as input of Harmony, default is PCA.

  • n_comps (int, optional (default: None)) – Number of components to be used in the rep. If n_comps == None, use all components; otherwise, use the minimum of n_comps and rep’s dimensions.

  • n_jobs (int, optional, default: -1.) – Number of threads to use in Harmony. -1 refers to using all physical CPU cores.

  • n_clusters (int, optional, default: None.) – Number of Harmony clusters. Default is None, which asks Harmony to estimate this number from the data.

  • random_state (int, optional, default: 0.) – Seed for random number generator

  • use_gpu (bool, optional, default: False.) – If True, use GPU if available. Otherwise, use CPU only.

  • max_iter_harmony (int, optional, default: 10.) – Maximum iterations on running Harmony if not converged.

Return type

str

Returns

  • out_rep (str) – The keyword in data.obsm referring to the embedding calculated by Harmony algorithm.

    This keyword is rep + '_harmony', where rep is the input parameter above.

  • Update data.obsm

    • data.obsm['X_' + out_rep]: The embedding calculated by Harmony algorithm.

Examples

>>> pg.run_harmony(data, rep = "pca", n_jobs = 10, random_state = 25)