pegasus.pseudobulk

pegasus.pseudobulk(data, sample, attrs=None, mat_key=None, cluster=None)[source]

Generate Pseudo-bulk count matrices.

Parameters
  • data (MultimodalData or UnimodalData object) – Annotated data matrix with rows for cells and columns for genes.

  • sample (str) – Specify the cell attribute used for aggregating pseudo-bulk data. Key must exist in data.obs.

  • attrs (str or List[str], optional, default: None) – Specify additional cell attributes to remain in the pseudo bulk data. If set, all attributes’ keys must exist in data.obs. Notice that for a categorical attribute, each pseudo-bulk’s value is the one of highest frequency among its cells, and for a numeric attribute, each pseudo-bulk’s value is the mean among its cells.

  • mat_key (str, optional, default: None) – Specify the single-cell count matrix used for aggregating pseudo-bulk counts: If None, use the raw count matrix in data: look for raw.X key in its matrices first; if not exists, use X key. Otherwise, if specified, use the count matrix with key mat_key from matrices of data.

  • cluster (str, optional, default: None) – If set, additionally generate pseudo-bulk matrices per cluster specified in data.obs[cluster].

Return type

UnimodalData

Returns

  • A UnimodalData object udata containing pseudo-bulk information –

    • It has the following count matrices:

      • X: The pseudo-bulk count matrix over all cells.

      • If cluster is set, a number of pseudo-bulk count matrices of cells belonging to the clusters, respectively.

    • udata.obs: It contains pseudo-bulk attributes aggregated from the corresponding single-cell attributes.

    • udata.var: Gene names and Ensembl IDs are maintained.

  • Update data

    • Add the returned UnimodalData object above to data with key <sample>-pseudobulk, where <sample> is replaced by the actual value of sample argument.

Examples

>>> pg.pseudobulk(data, sample="Channel")