pegasus.pseudobulk

pegasus.pseudobulk(data, groupby, attrs=None, mat_key=None, condition=None)[source]

Generate Pseudo-bulk count matrices.

Parameters
  • data (MultimodalData or UnimodalData object) – Annotated data matrix with rows for cells and columns for genes.

  • groupby (str) – Specify the cell attribute used for aggregating pseudo-bulk data. Key must exist in data.obs.

  • attrs (str or List[str], optional, default: None) – Specify additional cell attributes to remain in the pseudo bulk data. If set, all attributes’ keys must exist in data.obs. Notice that for a categorical attribute, each pseudo-bulk’s value is the one of highest frequency among its cells, and for a numeric attribute, each pseudo-bulk’s value is the mean among its cells.

  • mat_key (str, optional, default: None) – Specify the single-cell count matrix used for aggregating pseudo-bulk counts: If specified, use the count matrix with key mat_key from matrices of data; otherwise, first look for key counts, then for raw.X if not existing.

  • condition (str, optional, default: None) – If set, additionally generate pseudo-bulk matrices per condition specified in data.obs[condition].

Returns

  • It has the following count matrices:

    • X: The pseudo-bulk count matrix over all cells.

    • If condition is set, add additional pseudo-bulk count matrices of cells restricted to each condition, respectively

  • mdata.obs: It contains pseudo-bulk attributes aggregated from the corresponding single-cell attributes.

  • mdata.var: Gene names and Ensembl IDs are maintained.

Return type

A MultimodalData object mdata containing pseudo-bulk information

Examples

>>> pg.pseudobulk(data, groupby="Channel")