pegasus.calc_signature_score
- pegasus.calc_signature_score(data, signatures, n_bins=50, standardize=True, show_omitted_genes=False, skip_threshold=1, random_state=0)[source]
Calculate signature / gene module score. [Li20-1]
This is an improved version of implementation in [Jerby-Arnon18].
- Parameters
data (
MultimodalData,UnimodalData, oranndata.AnnDataobject.) – Single cell expression data.signatures (
Dict[str, List[str]]orstr) –This argument accepts either a dictionary or a string. If
signaturesis a dictionary, it can contain multiple signature score calculation requests. Each key in the dictionary represents a separate signature score calculation and its corresponding value contains a list of gene symbols. Each score will be stored in data.obs field with key as the keyword. Ifsignaturesis a string, it should refer to a Gene Matrix Transposed (GMT)-formatted file. Pegasus will load signatures from the GMT file.- Pegasus also provide 5 default signature panels for each of human and mouse. They are
cell_cycle_human,gender_human,mitochondrial_genes_human,ribosomal_genes_humanandapoptosis_humanfor human;cell_cycle_mouse,gender_mouse,mitochondrial_genes_mouse,ribosomal_genes_mouseandapoptosis_mousefor mouse. cell_cycle_humancontains two cell-cycle signatures,G1/SandG2/M, obtained from Tirosh et al. 2016. We also updated gene symbols according to Seurat’scc.genes.updated.2019vector. We additionally calculate signature scorescyclingandcycle_diff, which aremax{G2/M, G1/S}andG2/M-G1/Srespectively. We provide predicted cell cycle phases indata.obs['predicted_phase']in case it is useful.predicted_phaseis predicted based onG1/SandG2/Mscores. First, we identifyG0cells. We apply KMeans algorithm to obtain 2 clusters based on thecyclingsignature.G0cells are from the cluster with smallest mean value. For each cell from the other cluster, ifG1/S>G2/M, it is aG1/Scell, otherwise it is aG2/Mcell.gender_humancontains two gender-specific signatures,female_scoreandmale_score. Genes were selected based on DE analysis between genders based on 8 channels of bone marrow data from HCA Census of Immune Cells and the brain nuclei data from Gaublomme and Li et al, 2019, Nature Communications. After calculation, three signature scores will be calculated:female_score,male_scoreandgender_score.female_scoreandmale_scoreare calculated based on female and male signatures respectively and a larger score represent a higher likelihood of that gender.gender_scoreis calculated asmale_score-female_score. A large positive score likely represents male and a large negative score likely represents female. Pegasus also provides predicted gender for each cell based ongender_score, which is stored indata.obs['predicted_gender']. To predict genders, we apply the KMeans algorithm to thegender_scoreand ask for 3 clusters. The clusters with a minimum and maximum clauster centers are predicted asfemaleandmalerespectively and the cluster in the middle is predicted asuncertain. Note that this approach is conservative and it is likely that users can predict genders based ongender_scorefor cells in theuncertaincluster with a reasonable accuracy.mitochondrial_genes_humancontains two signatures,mito_genesandmito_ribo.mito_genescontains 13 mitocondrial genes from chrM andmito_ribocontains mitocondrial ribosomal genes that are not from chrM. Note thatmito_genescorrelates well with percent of mitocondrial UMIs andmito_ribodoes not.ribosomal_genes_humancontains one signature,ribo_genes, which includes ribosomal genes from both large and small units.apoptosis_humancontains one signature,apoptosis, which includes apoptosis-related genes from the KEGG pathway.cell_cycle_mouse,gender_mouse,mitochondrial_genes_mouse,ribosomal_genes_mouseandapoptosis_mouseare the corresponding signatures for mouse. Gene symbols are directly translated from human genes.
- In addition, Pegasus provides the following 4 curated signature panels:
emt_human, the Epithelial-Mesenchymal Transition signature from Gibbons and Creighton Dev. Dyn. 2018.human_lung, human lung cell type markers.mouse_brain, mouse brain cell type markers.mouse_liver, mouse liver cell type markers.
- Pegasus also provide 5 default signature panels for each of human and mouse. They are
n_bins (
int, optional, default: 50) – Number of bins on expression levels for grouping genes.standardize (
bool, optional, default: True) – If standardize the resulting signature scores regarding gene means.show_omitted_genes (
bool, optional, default False) – Signature genes that are not expressed in the data will be omitted. By default, pegasus does not report which genes are omitted. If this option is turned on, report omitted genes.skip_threshold (
int, optional, default 1) – Skip signature calculation of number of kept genes is less than skip_threshold.random_state (
int, optional, default: 0) – Random state used by KMeans if signature ==gender_humanorgender_mouse.
- Return type
None- Returns
None.Update
data.obs–data.obs["key"]: signature / gene module score for signature “key”
Update
data.var–data.var["mean"]: Mean expression of each gene across all cells. Only updated if “mean” does not exist in data.var.data.var["bins"]: Bin category for each gene. Only updated if data.uns[“sig_n_bins”] is updated.
Update
data.obsm–data.obsm["sig_background"]: Expected signature score for each bin category. Only updated if data.uns[“sig_n_bins”] is updated.
Update
data.uns–data.uns["sig_n_bins"]: Number of bins to partition genes into. Only updated if “sig_n_bins” does not exist or the recorded number of bins does not match n_bins.
Examples
>>> pg.calc_signature_score(data, {"T_cell_sig": ["CD3D", "CD3E", "CD3G", "TRAC"]}) >>> pg.calc_signature_score(data, "cell_cycle_human")