API¶
Pegasus can also be used as a python package. Import pegasus by:
import pegasus as pg
Read and Write¶
|
Load data into memory. |
|
Write data back to disk. |
|
Aggregate channel-specific count matrices into one big count matrix. |
Analysis Tools¶
Preprocess¶
|
Generate Quality Control (QC) metrics regarding cell barcodes on the dataset. |
|
Calculate filtration stats on cell barcodes. |
|
Filter data based on qc_metrics calculated in |
|
Identify robust genes as candidates for HVG selection and remove genes that are not expressed in any cells. |
|
Normalize each cell by total counts, and then apply natural logarithm to the data. |
|
Apply the Log(1+x) transformation on the given count matrix. |
|
Normalize each cell by total count. |
|
Conduct arcsinh transform on the current matrix. |
|
Highly variable features (HVF) selection. |
|
Subset the features and store the resulting matrix in dense format in data.uns with '_tmp_fmat_' prefix, with the option of standardization and truncating based on max_value. |
|
Perform Principle Component Analysis (PCA) to the data. |
|
Perform Nonnegative Matrix Factorization (NMF) to the data using Frobenius norm. |
|
Regress out effects due to specific observational attributes. |
|
Calculate the standardized z scores of the count matrix. |
Batch Correction¶
|
Batch correction on PCs using Harmony. |
|
Batch correction using Scanorama. |
|
Perform Integrative Nonnegative Matrix Factorization (iNMF) [Yang16] for data integration. |
|
Run scVI embedding. |
Nearest Neighbors¶
|
Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms. |
|
Find K nearest neighbors for each data point and return the indices and distances arrays. |
|
Calculate the kBET metric of the data regarding a specific sample attribute and embedding. |
|
Calculate the kSIM metric of the data regarding a specific sample attribute and embedding. |
Diffusion Map¶
|
Calculate Diffusion Map. |
|
Calculate Pseudotime based on Diffusion Map. |
|
Inference on path of a cluster. |
Cluster Algorithms¶
|
Cluster the data using the chosen algorithm. |
|
Cluster the cells using Louvain algorithm. |
|
Cluster the data using Leiden algorithm. |
|
Use Leiden algorithm to split 'clust_id' in 'clust_label' into 'n_components' sub-clusters and write the new clusting results to 'res_label'. |
|
Cluster the data using Spectral Louvain algorithm. |
|
Cluster the data using Spectral Leiden algorithm. |
Visualization Algorithms¶
|
Calculate t-SNE embedding of cells using the FIt-SNE package. |
|
Calculate UMAP embedding of cells. |
|
Construct the Force-directed (FLE) graph. |
|
Calculate Net-UMAP embedding of cells. |
|
Construct Net-Force-directed (FLE) graph. |
Doublet Detection¶
|
Infer doublets by first calculating Scrublet-like [Wolock18] doublet scores and then smartly determining an appropriate doublet score cutoff [Li20-2] . |
|
Convert doublet prediction into doublet annotations that Pegasus can recognize. |
Gene Module Score¶
|
Calculate signature / gene module score. |
Label Transfer¶
|
Run scArches training. |
|
Run scArches training. |
Differential Expression and Gene Set Enrichment Analysis¶
|
Perform Differential Expression (DE) Analysis on data. |
|
Extract DE results into a human readable structure. |
|
Write DE analysis results into Excel workbook. |
|
Perform Gene Set Enrichment Analysis using fGSEA. |
Annotate clusters¶
|
Infer putative cell types for each cluster using legacy markers. |
|
Add annotation to the data object as a categorical variable. |
Plotting¶
|
Generate scatter plots for different attributes |
|
Generate scatter plots of attribute 'attr' for each category in attribute 'group'. |
|
Scatter plot on spatial coordinates. |
|
Generate a composition plot, which shows the percentage of cells from each condition for every cluster. |
|
Generate a stacked violin plot. |
|
Generate a heatmap. |
|
Generate a dot plot. |
|
Generate a dendrogram on hierarchical clustering result. |
|
Generate highly variable feature plot. |
|
Plot quality control statistics (before filtration vs. |
|
Generate Volcano plots (-log10 p value vs. |
|
Generate a barcode rank plot, which shows the total UMIs against barcode rank (in descending order with respect to total UMIs) |
|
Generate ridge plots, up to 8 features can be shown in one figure. |
|
Generate one word cloud image for factor (starts from 0) in data.uns['W']. |
|
Generate GSEA barplots |
|
Generate Elbowplot and suggest n_comps to select based on random matrix theory (see utils.largest_variance_from_random_matrix). |
Pseudo-bulk analysis¶
|
Generate Pseudo-bulk count matrices. |
|
Perform Differential Expression (DE) Analysis using DESeq2 on pseduobulk data. |
|
Extract pseudobulk DE results into a human readable structure. |
|
Write pseudo-bulk DE analysis results into a Excel workbook. |
|
Generate Volcano plots (-log10 p value vs. |
Demultiplexing¶
|
For cell-hashing data, estimate antibody background probability using KMeans algorithm. |
|
Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in |
|
Write demultiplexing results into raw gene expression matrix. |
Miscellaneous¶
|
Extract and display gene expressions for each cluster. |
|
Extract and display differential expression analysis results of markers for each cluster. |
|
Using MWU test to detect if any cluster is an outlier regarding one of the qc attributes: n_genes, n_counts, percent_mito. |
|
Find markers using gradient boosting method. |