pegasus.calc_signature_score¶
-
pegasus.
calc_signature_score
(data, signatures, n_bins=50, show_omitted_genes=False, random_state=0)[source]¶ Calculate signature / gene module score.
This is an improved version of implementation in [Jerby-Arnon18]. See here for details on the method.
- Parameters
data (
MultimodalData
,UnimodalData
, oranndata.AnnData
object.) – Single cell expression data.signatures (
Dict[str, List[str]]
orstr
) –This argument accepts either a dictionary or a string. If
signatures
is a dictionary, it can contain multiple signature score calculation requests. Each key in the dictionary represents a separate signature score calculation and its corresponding value contains a list of gene symbols. Each score will be stored in data.obs field with key as the keyword. Ifsignatures
is a string, it should refer to a Gene Matrix Transposed (GMT)-formatted file. Pegasus will load signatures from the GMT file. Pegasus also provide 4 default signature panels for each of human and mouse. They arecell_cycle_human
,gender_human
,mitochondrial_genes_human
andribosomal_genes_human
for human;cell_cycle_mouse
,gender_mouse
,mitochondrial_genes_mouse
andribosomal_genes_mouse
for mouse.cell_cycle_human
contains two cell-cycle signatures,G1/S
andG2/M
, obtained from Tirosh et al. 2016. We also updated gene symbols according to Seurat’scc.genes.updated.2019
vector. We additionally calculated signature scorecycle_diff
, which isG2/M
-G1/S
. We provide predicted cell cycle phases indata.obs['predicted_phase']
in case it is useful.predicted_phase
is predicted based onG1/S
andG2/M
scores. First, we identifyG0
cells. We calculate vectormaxvalues
as the maximum score ofG1/S
andG2/M
for each cell and then estimate the distribution underneathmaxvalues
using scipy.stats.gaussian_kde. Based on the estimated density function we can identify the peak(s) (i.e. mu) and estimate standard deviation (i.e. sigma) of the null (G0
) distribution. We set any cell with maxvalues <= mu + 3 * sigma asG0
cells. For each other cell, ifG1/S
>G2/M
, it is aG1/S
cell, otherwise it is aG2/M
cell.gender_human
contains two gender-specific signatures,female_score
andmale_score
. Genes were selected based on DE analysis between genders based on 8 channels of bone marrow data from HCA Census of Immune Cells and the brain nuclei data from Gaublomme and Li et al, 2019, Nature Communications. After calculation, three signature scores will be calculated:female_score
,male_score
andgender_score
.female_score
andmale_score
are calculated based on female and male signatures respectively and a larger score represent a higher likelihood of that gender.gender_score
is calculated asmale_score
-female_score
. A large positive score likely represents male and a large negative score likely represents female. Pegasus also provides predicted gender for each cell based ongender_score
, which is stored indata.obs['predicted_gender']
. To predict genders, we apply the KMeans algorithm to thegender_score
and ask for 3 clusters. The clusters with a minimum and maximum clauster centers are predicted asfemale
andmale
respectively and the cluster in the middle is predicted asuncertain
. Note that this approach is conservative and it is likely that users can predict genders based ongender_score
for cells in theuncertain
cluster with a reasonable accuracy.mitochondrial_genes_human
contains two signatures,mito_genes
andmito_ribo
.mito_genes
contains 13 mitocondrial genes from chrM andmito_ribo
contains mitocondrial ribosomal genes that are not from chrM. Note thatmito_genes
correlates well with percent of mitocondrial UMIs andmito_ribo
does not.ribosomal_genes_human
contains one signature,ribo_genes
, which are ribosomal genes from both large and small units.cell_cycle_mouse
,gender_mouse
,mitochondrial_genes_mouse
andribosomal_genes_mouse
are the corresponding signatures for mouse. Gene symbols are directly translated from human genes.
n_bins (
int
, optional, default: 50) – Number of bins on expression levels for grouping genesshow_omitted_genes (
bool
, optional, default False) – Signature genes that are not expressed in the data will be omitted. By default, pegasus does not report which genes are omitted. If this option is turned on, report omitted genes.random_state (
int
, optional, default: 0) – Random state used by KMeans if signature ==gender_human
orgender_mouse
.
- Return type
None
- Returns
None
.Update
data.obs
–data.obs["key"]
: signature / gene module score for signature “key”
Update
data.var
–data.var["mean"]
: Mean expression of each gene across all cells. Only updated if “mean” does not exist in data.var.data.var["bins"]
: Bin category for each gene. Only updated if data.uns[“sig_n_bins”] is updated.
Update
data.obsm
–data.obsm["sig_background"]
: Expected signature score for each bin category. Only updated if data.uns[“sig_n_bins”] is updated.
Update
data.uns
–data.uns["sig_n_bins"]
: Number of bins to partition genes into. Only updated if “sig_n_bins” does not exist or the recorded number of bins does not match n_bins.
Examples
>>> pg.calc_signature_score(data, {"T_cell_sig": ["CD3D", "CD3E", "CD3G", "TRAC"]}) >>> pg.calc_signature_score(data, "cell_cycle_human")