pegasus.highly_variable_features¶

pegasus.highly_variable_features(data, batch=None, flavor='pegasus', n_top=2000, span=0.02, min_disp=0.5, max_disp=inf, min_mean=0.0125, max_mean=7, n_jobs=- 1)[source]¶

Highly variable features (HVF) selection. The input data should be logarithmized.

Parameters

data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.
batch (str, optional, default: None) – A key in data.obs specifying batch information. If batch is not set, do not consider batch effects in selecting highly variable features. Otherwise, if data.obs[batch] is not categorical, data.obs[batch] will be automatically converted into categorical before highly variable feature selection.
flavor (str, optional, default: "pegasus") – The HVF selection method to use. Available choices are "pegasus" or "Seurat".
n_top (int, optional, default: 2000) – Number of genes to be selected as HVF. if None, no gene will be selected.
span (float, optional, default: 0.02) – Only applicable when flavor is "pegasus". The smoothing factor used by scikit-learn loess model in pegasus HVF selection method.
min_disp (float, optional, default: 0.5) – Minimum normalized dispersion.
max_disp (float, optional, default: np.inf) – Maximum normalized dispersion. Set it to np.inf for infinity bound.
min_mean (float, optional, default: 0.0125) – Minimum mean.
max_mean (float, optional, default: 7) – Maximum mean.
n_jobs (int, optional, default: -1) – Number of threads to be used during calculation. If -1, all physical CPU cores will be used.

Return type

None

Returns

None
Update data.var –
- highly_variable_features: replace with Boolean type array indicating the selected highly variable features.

Examples

>>> pg.highly_variable_features(data)
>>> pg.highly_variable_features(data, batch="Channel")