pegasus.infer_doublets

pegasus.infer_doublets(data, channel_attr=None, clust_attr=None, min_cell=100, expected_doublet_rate=None, sim_doublet_ratio=2.0, n_prin_comps=30, robust=False, k=None, n_jobs=- 1, alpha=0.05, random_state=0, plot_hist='dbl')[source]

Infer doublets using a Scrublet-like strategy. [Li20-2]

This function must be called after clustering.

Parameters
  • data (pegasusio.MultimodalData) – Annotated data matrix with rows for cells and columns for genes.

  • channel_attr (str, optional, default: None) – Attribute indicating sample channels. If set, calculate scrublet-like doublet scores per channel.

  • clust_attr (str, optional, default: None) – Attribute indicating cluster labels. If set, estimate proportion of doublets in each cluster and statistical significance.

  • min_cell (int, optional, default: 100) – Minimum number of cells per sample to calculate doublet scores. For samples having less than ‘min_cell’ cells, doublet score calculation will be skipped.

  • expected_doublet_rate (float, optional, default: None) – The expected doublet rate for the experiment. By default, calculate the expected rate based on number of cells from the 10x multiplet rate table

  • sim_doublet_ratio (float, optional, default: 2.0) – The ratio between synthetic doublets and observed cells.

  • n_prin_comps (int, optional, default: 30) – Number of principal components.

  • robust (bool, optional, default: False.) – If true, use ‘arpack’ instead of ‘randomized’ for large matrices (i.e. max(X.shape) > 500 and n_components < 0.8 * min(X.shape))

  • k (int, optional, default: None) – Number of observed cell neighbors. If None, k = round(0.5 * sqrt(number of observed cells)). Total neighbors k_adj = round(k * (1.0 + sim_doublet_ratio)).

  • n_job (int, optional, default: -) – Number of threads to use. If -1, use all available threads.

  • alpha (float, optional, default: 0.05) – FDR significant level for cluster-level fisher exact test.

  • random_state (int, optional, default: 0) – Random seed for reproducing results.

  • plot_hist (str, optional, default: dbl) – If not None, plot diagnostic histograms using plot_hist as the prefix. If channel_attr is None, plot_hist.png is generated; Otherwise, plot_hist.channel_name.png files are generated.

Return type

None

Returns

  • None

  • Update data.obs

    • data.obs['pred_dbl_type']: Predicted singlet/doublet types.

    • data.uns['pred_dbl_cluster']: Only generated if ‘clust_attr’ is not None. This is a dataframe with two columns, ‘Cluster’ and ‘Qval’. Only clusters with significantly more doublets than expected will be recorded here.

Examples

>>> pg.infer_doublets(data, channel_attr = 'Channel', clust_attr = 'Annotation')