API¶
Pegasus can also be used as a python package. Import pegasus by:
import pegasus as pg
Analysis Tools¶
Read and Write¶
|
Load data into memory. |
|
Write data back to disk. |
|
Aggregate channel-specific count matrices into one big count matrix. |
Preprocess¶
|
Generate Quality Control (QC) metrics regarding cell barcodes on the dataset. |
|
Calculate filtration stats on cell barcodes. |
|
Filter data based on qc_metrics calculated in |
|
Identify robust genes as candidates for HVG selection and remove genes that are not expressed in any cells. |
|
Normalization, and then apply natural logarithm to the data. |
|
Highly variable features (HVF) selection. |
|
Subset the features and store the resulting matrix in dense format in data.uns with ‘fmat_’ prefix, with the option of standardization and truncating based on max_value. |
|
Perform Principle Component Analysis (PCA) to the data. |
|
Regress out effects due to specific observational attributes at Principal Component level. |
Batch Correction¶
|
Set group attributes used in batch correction. |
|
Batch correction on data using Location-Scale (L/S) Adjustment method. |
|
Batch correction on PCs using Harmony. |
|
Batch correction using Scanorama. |
Nearest Neighbors¶
|
Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms. |
|
Calculate the kBET metric of the data regarding a specific sample attribute and embedding. |
|
Calculate the kSIM metric of the data regarding a specific sample attribute and embedding. |
Diffusion Map¶
|
Calculate Diffusion Map. |
|
Reduce high-dimensional Diffusion Map matrix to 3-dimentional. |
|
Calculate Pseudotime based on Diffusion Map. |
|
Inference on path of a cluster. |
Cluster algorithms¶
|
Cluster the data using the chosen algorithm. |
|
Cluster the cells using Louvain algorithm. |
|
Cluster the data using Leiden algorithm. |
|
Cluster the data using Spectral Louvain algorithm. |
|
Cluster the data using Spectral Leiden algorithm. |
Visualization Algorithms¶
|
Calculate tSNE embedding of cells. |
|
Calculate FIt-SNE embedding of cells. |
|
Calculate UMAP embedding of cells. |
|
Construct the Force-directed (FLE) graph. |
|
Calculate Net-tSNE embedding of cells. |
|
Calculate Net-UMAP embedding of cells. |
|
Construct Net-Force-directed (FLE) graph. |
Differential Expression Analysis¶
|
Perform Differential Expression (DE) Analysis on data. |
|
Extract DE results into a human readable structure. |
|
Write DE analysis results into Excel workbook. |
Marker Detection based on Gradient Boost Machine¶
|
Find markers using gradient boosting method. |
Annotate clusters:¶
|
Infer putative cell types for each cluster using legacy markers. |
|
Add annotation to AnnData obj. |
Plotting¶
|
Generate scatter plots for different attributes |
|
Generate scatter plots of attribute ‘attr’ for each category in attribute ‘group’. |
|
Generate a composition plot, which shows the percentage of cells from each condition for every cluster. |
|
Generate a stacked violin plot. |
|
Generate a heatmap. |
|
Generate a dot plot. |
|
Generate a dendrogram on hierarchical clustering result. |
|
Generate highly variable feature plot. |
|
Plot quality control statistics (before filtration vs. |
|
Generate Volcano plots (-log10 p value vs. |
Demultiplexing¶
|
For cell-hashing data, estimate antibody background probability using KMeans algorithm. |
|
Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in |
|
Write demultiplexing results into raw gene expression matrix. |
Doublet Detection¶
|
Calculate doublet scores using Scrublet for each channel on the current associated data.X matrix. |
|
Infer doublets based on Scrublet scores. |
|
Convert doublet prediction into doublet annotations that Pegasus can recognize. |
Gene Module Score¶
|
Calculate signature / gene module score. |
Miscellaneous¶
|
Extract and display gene expressions for each cluster from an anndata object. |
|
Extract and display differential expression analysis results of markers for each cluster. |