pegasus.calc_signature_score¶
- pegasus.calc_signature_score(data, signatures, n_bins=50, show_omitted_genes=False, random_state=0)[source]¶
Calculate signature / gene module score. [Li20-1]
This is an improved version of implementation in [Jerby-Arnon18].
- Parameters
data (
MultimodalData
,UnimodalData
, oranndata.AnnData
object.) – Single cell expression data.signatures (
Dict[str, List[str]]
orstr
) –This argument accepts either a dictionary or a string. If
signatures
is a dictionary, it can contain multiple signature score calculation requests. Each key in the dictionary represents a separate signature score calculation and its corresponding value contains a list of gene symbols. Each score will be stored in data.obs field with key as the keyword. Ifsignatures
is a string, it should refer to a Gene Matrix Transposed (GMT)-formatted file. Pegasus will load signatures from the GMT file.- Pegasus also provide 5 default signature panels for each of human and mouse. They are
cell_cycle_human
,gender_human
,mitochondrial_genes_human
,ribosomal_genes_human
andapoptosis_human
for human;cell_cycle_mouse
,gender_mouse
,mitochondrial_genes_mouse
,ribosomal_genes_mouse
andapoptosis_mouse
for mouse. cell_cycle_human
contains two cell-cycle signatures,G1/S
andG2/M
, obtained from Tirosh et al. 2016. We also updated gene symbols according to Seurat’scc.genes.updated.2019
vector. We additionally calculate signature scorescycling
andcycle_diff
, which aremax{G2/M, G1/S}
andG2/M
-G1/S
respectively. We provide predicted cell cycle phases indata.obs['predicted_phase']
in case it is useful.predicted_phase
is predicted based onG1/S
andG2/M
scores. First, we identifyG0
cells. We apply KMeans algorithm to obtain 2 clusters based on thecycling
signature.G0
cells are from the cluster with smallest mean value. For each cell from the other cluster, ifG1/S
>G2/M
, it is aG1/S
cell, otherwise it is aG2/M
cell.gender_human
contains two gender-specific signatures,female_score
andmale_score
. Genes were selected based on DE analysis between genders based on 8 channels of bone marrow data from HCA Census of Immune Cells and the brain nuclei data from Gaublomme and Li et al, 2019, Nature Communications. After calculation, three signature scores will be calculated:female_score
,male_score
andgender_score
.female_score
andmale_score
are calculated based on female and male signatures respectively and a larger score represent a higher likelihood of that gender.gender_score
is calculated asmale_score
-female_score
. A large positive score likely represents male and a large negative score likely represents female. Pegasus also provides predicted gender for each cell based ongender_score
, which is stored indata.obs['predicted_gender']
. To predict genders, we apply the KMeans algorithm to thegender_score
and ask for 3 clusters. The clusters with a minimum and maximum clauster centers are predicted asfemale
andmale
respectively and the cluster in the middle is predicted asuncertain
. Note that this approach is conservative and it is likely that users can predict genders based ongender_score
for cells in theuncertain
cluster with a reasonable accuracy.mitochondrial_genes_human
contains two signatures,mito_genes
andmito_ribo
.mito_genes
contains 13 mitocondrial genes from chrM andmito_ribo
contains mitocondrial ribosomal genes that are not from chrM. Note thatmito_genes
correlates well with percent of mitocondrial UMIs andmito_ribo
does not.ribosomal_genes_human
contains one signature,ribo_genes
, which includes ribosomal genes from both large and small units.apoptosis_human
contains one signature,apoptosis
, which includes apoptosis-related genes from the KEGG pathway.cell_cycle_mouse
,gender_mouse
,mitochondrial_genes_mouse
,ribosomal_genes_mouse
andapoptosis_mouse
are the corresponding signatures for mouse. Gene symbols are directly translated from human genes.
- Pegasus also provide 5 default signature panels for each of human and mouse. They are
n_bins (
int
, optional, default: 50) – Number of bins on expression levels for grouping genes.show_omitted_genes (
bool
, optional, default False) – Signature genes that are not expressed in the data will be omitted. By default, pegasus does not report which genes are omitted. If this option is turned on, report omitted genes.random_state (
int
, optional, default: 0) – Random state used by KMeans if signature ==gender_human
orgender_mouse
.
- Return type
None
- Returns
None
.Update
data.obs
–data.obs["key"]
: signature / gene module score for signature “key”
Update
data.var
–data.var["mean"]
: Mean expression of each gene across all cells. Only updated if “mean” does not exist in data.var.data.var["bins"]
: Bin category for each gene. Only updated if data.uns[“sig_n_bins”] is updated.
Update
data.obsm
–data.obsm["sig_background"]
: Expected signature score for each bin category. Only updated if data.uns[“sig_n_bins”] is updated.
Update
data.uns
–data.uns["sig_n_bins"]
: Number of bins to partition genes into. Only updated if “sig_n_bins” does not exist or the recorded number of bins does not match n_bins.
Examples
>>> pg.calc_signature_score(data, {"T_cell_sig": ["CD3D", "CD3E", "CD3G", "TRAC"]}) >>> pg.calc_signature_score(data, "cell_cycle_human")