pegasus.infer_doublets¶
-
pegasus.
infer_doublets
(data, channel_attr=None, clust_attr=None, min_cell=100, expected_doublet_rate=None, sim_doublet_ratio=2.0, n_prin_comps=30, robust=False, k=None, n_jobs=- 1, alpha=0.05, random_state=0, plot_hist='dbl')[source]¶ Infer doublets using a Scrublet-like strategy. [Li20-2]
This function must be called after clustering.
- Parameters
data (
pegasusio.MultimodalData
) – Annotated data matrix with rows for cells and columns for genes.channel_attr (
str
, optional, default: None) – Attribute indicating sample channels. If set, calculate scrublet-like doublet scores per channel.clust_attr (
str
, optional, default: None) – Attribute indicating cluster labels. If set, estimate proportion of doublets in each cluster and statistical significance.min_cell (
int
, optional, default: 100) – Minimum number of cells per sample to calculate doublet scores. For samples having less than ‘min_cell’ cells, doublet score calculation will be skipped.expected_doublet_rate (
float
, optional, default:None
) – The expected doublet rate for the experiment. By default, calculate the expected rate based on number of cells from the 10x multiplet rate tablesim_doublet_ratio (
float
, optional, default:2.0
) – The ratio between synthetic doublets and observed cells.n_prin_comps (
int
, optional, default:30
) – Number of principal components.robust (
bool
, optional, default:False
.) – If true, use ‘arpack’ instead of ‘randomized’ for large matrices (i.e. max(X.shape) > 500 and n_components < 0.8 * min(X.shape))k (
int
, optional, default:None
) – Number of observed cell neighbors. If None, k = round(0.5 * sqrt(number of observed cells)). Total neighbors k_adj = round(k * (1.0 + sim_doublet_ratio)).n_job (
int
, optional, default:-
) – Number of threads to use. If-1
, use all available threads.alpha (
float
, optional, default:0.05
) – FDR significant level for cluster-level fisher exact test.random_state (
int
, optional, default:0
) – Random seed for reproducing results.plot_hist (
str
, optional, default:dbl
) – If not None, plot diagnostic histograms usingplot_hist
as the prefix. If channel_attr is None,plot_hist.png
is generated; Otherwise,plot_hist.channel_name.png
files are generated.
- Return type
None
- Returns
None
Update
data.obs
–data.obs['pred_dbl_type']
: Predicted singlet/doublet types.data.uns['pred_dbl_cluster']
: Only generated if ‘clust_attr’ is not None. This is a dataframe with two columns, ‘Cluster’ and ‘Qval’. Only clusters with significantly more doublets than expected will be recorded here.
Examples
>>> pg.infer_doublets(data, channel_attr = 'Channel', clust_attr = 'Annotation')