pegasus.highly_variable_features

pegasus.highly_variable_features(data, batch=None, flavor='pegasus', n_top=2000, span=0.02, min_disp=0.5, max_disp=inf, min_mean=0.0125, max_mean=7, n_jobs=-1)[source]

Highly variable features (HVF) selection. The input data should be logarithmized.

Parameters
  • data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.

  • batch (str, optional, default: None) – A key in data.obs specifying batch information. If batch is not set, do not consider batch effects in selecting highly variable features. Otherwise, if data.obs[batch] is not categorical, data.obs[batch] will be automatically converted into categorical before highly variable feature selection.

  • flavor (str, optional, default: "pegasus") – The HVF selection method to use. Available choices are "pegasus" or "Seurat".

  • n_top (int, optional, default: 2000) – Number of genes to be selected as HVF. if None, no gene will be selected.

  • span (float, optional, default: 0.02) – Only applicable when flavor is "pegasus". The smoothing factor used by scikit-learn loess model in pegasus HVF selection method.

  • min_disp (float, optional, default: 0.5) – Minimum normalized dispersion.

  • max_disp (float, optional, default: np.inf) – Maximum normalized dispersion. Set it to np.inf for infinity bound.

  • min_mean (float, optional, default: 0.0125) – Minimum mean.

  • max_mean (float, optional, default: 7) – Maximum mean.

  • n_jobs (int, optional, default: -1) – Number of threads to be used during calculation. If -1, all physical CPU cores will be used.

Return type

None

Returns

  • None

  • Update data.var

    • highly_variable_features: replace with Boolean type array indicating the selected highly variable features.

Examples

>>> pg.highly_variable_features(data)
>>> pg.highly_variable_features(data, batch="Channel")