pegasus.qc_metrics

pegasus.qc_metrics(data, select_singlets=False, remap_string=None, subset_string=None, min_genes=None, max_genes=None, min_umis=None, max_umis=None, mito_prefix=None, percent_mito=None)[source]

Generate Quality Control (QC) metrics regarding cell barcodes on the dataset.

Parameters
  • data (pegasusio.MultimodalData) – Use current selected modality in data, which should contain one RNA expression matrix.

  • select_singlets (bool, optional, default False) – If select only singlets.

  • remap_string (str, optional, default None) – Remap singlet names using <remap_string>, where <remap_string> takes the format “new_name_i:old_name_1,old_name_2;new_name_ii:old_name_3;…”. For example, if we hashed 5 libraries from 3 samples sample1_lib1, sample1_lib2, sample2_lib1, sample2_lib2 and sample3, we can remap them to 3 samples using this string: “sample1:sample1_lib1,sample1_lib2;sample2:sample2_lib1,sample2_lib2”. In this way, the new singlet names will be in metadata field with key ‘assignment’, while the old names will be kept in metadata field with key ‘assignment.orig’.

  • subset_string (str, optional, default None) – If select singlets, only select singlets in the <subset_string>, which takes the format “name1,name2,…”. Note that if –remap-singlets is specified, subsetting happens after remapping. For example, we can only select singlets from sampe 1 and 3 using “sample1,sample3”.

  • min_genes (int, optional, default: None) – Only keep cells with at least min_genes genes.

  • max_genes (int, optional, default: None) – Only keep cells with less than max_genes genes.

  • min_umis (int, optional, default: None) – Only keep cells with at least min_umis UMIs.

  • max_umis (int, optional, default: None) – Only keep cells with less than max_umis UMIs.

  • mito_prefix (str, optional, default: None) – Prefix for mitochondrial genes.

  • percent_mito (float, optional, default: None) – Only keep cells with percent mitochondrial genes less than percent_mito % of total counts.

Return type

None

Returns

  • None

  • Update data.obs

    • n_genes: Total number of genes for each cell.

    • n_counts: Total number of counts for each cell.

    • percent_mito: Percent of mitochondrial genes for each cell.

    • passed_qc: Boolean type indicating if a cell passes the QC process based on the QC metrics.

    • demux_type: this column might be deleted if select_singlets is on.

Examples

>>> pg.qc_metrics(data, min_genes=500, max_genes=6000, mito_prefix="MT-", percent_mito=10)