pegasus.pseudobulk¶

pegasus.pseudobulk(data, sample, attrs=None, mat_key=None, cluster=None)[source]¶

Generate Pseudo-bulk count matrices.

Parameters

data (MultimodalData or UnimodalData object) – Annotated data matrix with rows for cells and columns for genes.
sample (str) – Specify the cell attribute used for aggregating pseudo-bulk data. Key must exist in data.obs.
attrs (str or List[str], optional, default: None) – Specify additional cell attributes to remain in the pseudo bulk data. If set, all attributes’ keys must exist in data.obs. Notice that for a categorical attribute, each pseudo-bulk’s value is the one of highest frequency among its cells, and for a numeric attribute, each pseudo-bulk’s value is the mean among its cells.
mat_key (str, optional, default: None) – Specify the single-cell count matrix used for aggregating pseudo-bulk counts: If None, use the raw count matrix in data: look for raw.X key in its matrices first; if not exists, use X key. Otherwise, if specified, use the count matrix with key mat_key from matrices of data.
cluster (str, optional, default: None) – If set, additionally generate pseudo-bulk matrices per cluster specified in data.obs[cluster].

Return type

UnimodalData

Returns

A UnimodalData object udata containing pseudo-bulk information –
- It has the following count matrices:
  - X: The pseudo-bulk count matrix over all cells.
  - If cluster is set, a number of pseudo-bulk count matrices of cells belonging to the clusters, respectively.
- udata.obs: It contains pseudo-bulk attributes aggregated from the corresponding single-cell attributes.
- udata.var: Gene names and Ensembl IDs are maintained.
Update data –
- Add the returned UnimodalData object above to data with key <sample>-pseudobulk, where <sample> is replaced by the actual value of sample argument.

Examples

>>> pg.pseudobulk(data, sample="Channel")