API

Pegasus can also be used as a python package. Import pegasus by:

import pegasus as pg

Read and Write

`read_input`(input_file[, file_type, mode, ...])	Load data into memory.
`write_output`(data, output_file[, file_type, ...])	Write data back to disk.
`aggregate_matrices`(csv_file[, restrictions, ...])	Aggregate channel-specific count matrices into one big count matrix.

Analysis Tools

Preprocess

`qc_metrics`(data[, select_singlets, ...])	Generate Quality Control (QC) metrics regarding cell barcodes on the dataset.
`get_filter_stats`(data[, min_genes_before_filt])	Calculate filtration stats on cell barcodes.
`filter_data`(data[, focus_list])	Filter data based on qc_metrics calculated in `pg.qc_metrics`.
`identify_robust_genes`(data[, percent_cells])	Identify robust genes as candidates for HVG selection and remove genes that are not expressed in any cells.
`log_norm`(data[, norm_count, base_matrix, ...])	Normalize each cell by total counts, and then apply natural logarithm to the data.
`log1p`(data[, base_matrix, target_matrix, select])	Apply the Log(1+x) transformation on the given count matrix.
`normalize`(data[, norm_count, base_matrix, ...])	Normalize each cell by total count.
`arcsinh`(data[, cofactor, jitter, ...])	Conduct arcsinh transform on the current matrix.
`highly_variable_features`(data[, batch, ...])	Highly variable features (HVF) selection.
`select_features`(data[, features, ...])	Subset the features and store the resulting matrix in dense format in data.uns with '_tmp_fmat_' prefix, with the option of standardization and truncating based on max_value.
`pca`(data[, n_components, features, ...])	Perform Principle Component Analysis (PCA) to the data.
`nmf`(data[, n_components, features, space, ...])	Perform Nonnegative Matrix Factorization (NMF) to the data using Frobenius norm.
`regress_out`(data, attrs[, rep, n_comps])	Regress out effects due to specific observational attributes.
`calculate_z_score`(data[, n_bins])	Calculate the standardized z scores of the count matrix.

Batch Correction

`run_harmony`(data[, batch, rep, n_comps, ...])	Batch correction on PCs using Harmony.
`run_scanorama`(data[, batch, n_components, ...])	Batch correction using Scanorama.
`integrative_nmf`(data[, batch, n_components, ...])	Perform Integrative Nonnegative Matrix Factorization (iNMF) [Yang16] for data integration.
`run_scvi`(data[, features, matkey, n_jobs, ...])	Run scVI embedding.

Nearest Neighbors

`neighbors`(data[, K, rep, n_comps, n_jobs, ...])	Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.
`get_neighbors`(data[, K, rep, n_comps, ...])	Find K nearest neighbors for each data point and return the indices and distances arrays.
`calc_kBET`(data, attr[, rep, K, alpha, ...])	Calculate the kBET metric of the data regarding a specific sample attribute and embedding.
`calc_kSIM`(data, attr[, rep, K, min_rate, ...])	Calculate the kSIM metric of the data regarding a specific sample attribute and embedding.

Diffusion Map

`diffmap`(data[, n_components, rep, solver, ...])	Calculate Diffusion Map.
`calc_pseudotime`(data, roots)	Calculate Pseudotime based on Diffusion Map.
`infer_path`(data, cluster, clust_id, path_name)	Inference on path of a cluster.

Cluster Algorithms

`cluster`(data[, algo, rep, resolution, ...])	Cluster the data using the chosen algorithm.
`louvain`(data[, rep, resolution, n_clust, ...])	Cluster the cells using Louvain algorithm.
`leiden`(data[, rep, resolution, n_clust, ...])	Cluster the data using Leiden algorithm.
`calc_dendrogram`(data[, groupby, rep, genes, ...])	Cluster data using hierarchical clustering algorithm.
`split_one_cluster`(data, clust_label, ...[, ...])	Use Leiden algorithm to split 'clust_id' in 'clust_label' into 'n_components' sub-clusters and write the new clusting results to 'res_label'.
`spectral_louvain`(data[, rep, resolution, ...])	Cluster the data using Spectral Louvain algorithm.
`spectral_leiden`(data[, rep, resolution, ...])	Cluster the data using Spectral Leiden algorithm.

Visualization Algorithms

`tsne`(data[, rep, rep_ncomps, n_jobs, ...])	Calculate t-SNE embedding of cells using the FIt-SNE package.
`umap`(data[, rep, rep_ncomps, n_components, ...])	Calculate UMAP embedding of cells.
`fle`(data[, file_name, n_jobs, rep, ...])	Construct the Force-directed (FLE) graph.
`net_umap`(data[, rep, n_jobs, n_components, ...])	Calculate Net-UMAP embedding of cells.
`net_fle`(data[, file_name, n_jobs, rep, K, ...])	Construct Net-Force-directed (FLE) graph.

Doublet Detection

`infer_doublets`(data[, channel_attr, ...])	Infer doublets by first calculating Scrublet-like [Wolock18] doublet scores and then smartly determining an appropriate doublet score cutoff [Li20-2] .
`mark_doublets`(data[, demux_attr, dbl_clusts])	Convert doublet prediction into doublet annotations that Pegasus can recognize.

Gene Module Score

calc_signature_score(data, signatures[, ...])

Calculate signature / gene module score.

Label Transfer

`train_scarches_scanvi`(data, dir_path, label)	Run scArches training.
`predict_scarches_scanvi`(data, dir_path, label)	Run scArches training.

Differential Expression and Gene Set Enrichment Analysis

`de_analysis`(data, cluster[, condition, ...])	Perform Differential Expression (DE) Analysis on data.
`markers`(data[, head, de_key, alpha])	Extract DE results into a human readable structure.
`write_results_to_excel`(results, output_file)	Write DE analysis results into Excel workbook.
`gsea`(data, rank_key, pathways[, method, ...])	Perform Gene Set Enrichment Analysis (GSEA).
`write_gsea_results_to_excel`(data, output_file)	Write Gene Set Enrichment Analysis (GSEA) results into Excel workbook.

Annotate clusters

`infer_cell_types`(data, markers[, de_test, ...])	Infer putative cell types for each cluster using legacy markers.
`annotate`(data, name, based_on, anno_dict)	Add annotation to the data object as a categorical variable.

Plotting

`scatter`(data[, attrs, basis, components, ...])	Generate scatter plots for different attributes
`scatter_groups`(data, attr, groupby[, basis, ...])	Generate scatter plots of attribute 'attr' for each category in attribute 'group'.
`spatial`(data[, attrs, basis, resolution, ...])	Scatter plot on spatial coordinates.
`compo_plot`(data, groupby, condition[, ...])	Generate a composition plot, which shows the percentage of cells from each condition for every cluster.
`violin`(data, attrs, groupby[, hue, matkey, ...])	Generate a stacked violin plot.
`heatmap`(data, attrs[, groupby, matkey, ...])	Generate a heatmap.
`dotplot`(data, genes, groupby[, ...])	Generate a dot plot.
`plot_dendrogram`(data[, graph_key, ...])	Generate a dendrogram on hierarchical clustering result
`hvfplot`(data[, top_n, panel_size, ...])	Generate highly variable feature plot.
`qcviolin`(data, plot_type[, ...])	Plot quality control statistics (before filtration vs.
`volcano`(data, cluster_id[, de_key, de_test, ...])	Generate Volcano plots (-log10 p value vs.
`rank_plot`(data[, panel_size, return_fig, dpi])	Generate a barcode rank plot, which shows the total UMIs against barcode rank (in descending order with respect to total UMIs)
`ridgeplot`(data, features[, matrix_key, ...])	Generate ridge plots, up to 8 features can be shown in one figure.
`wordcloud`(data, factor[, max_words, ...])	Generate one word cloud image for factor (starts from 0) in data.uns['W'].
`plot_gsea`(data[, gsea_keyword, alpha, ...])	Generate GSEA barplots
`elbowplot`(data[, rep, pval, panel_size, ...])	Generate Elbowplot and suggest n_comps to select based on random matrix theory (see utils.largest_variance_from_random_matrix).

Pseudo-bulk analysis

`pseudobulk`(data, groupby[, attrs, mat_key, ...])	Generate Pseudo-bulk count matrices.
`deseq2`(pseudobulk, design, contrasts[, ...])	Perform Differential Expression (DE) Analysis using DESeq2 on pseduobulk data.
`pseudo.markers`(pseudobulk[, head, de_key, alpha])	Extract pseudobulk DE results into a human readable structure.
`pseudo.write_results_to_excel`(results, ...)	Write pseudo-bulk DE analysis results into a Excel workbook.
`pseudo.volcano`(pseudobulk[, de_key, ...])	Generate Volcano plots (-log10 p value vs.
`pseudo.get_original_DE_result`(pseudobulk[, ...])	Get the original DESeq2 result as a data frame.

Demultiplexing

`estimate_background_probs`(hashing_data[, ...])	For cell-hashing data, estimate antibody background probability using KMeans algorithm.
`demultiplex`(rna_data, hashing_data[, ...])	Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in `demuxEM.estimate_background_probs`.
`attach_demux_results`(input_rna_file, rna_data)	Write demultiplexing results into raw gene expression matrix.

Miscellaneous

`search_genes`(data, gene_list[, rec_key, measure])	Extract and display gene expressions for each cluster.
`search_de_genes`(data, gene_list[, rec_key, ...])	Extract and display differential expression analysis results of markers for each cluster.
`find_outlier_clusters`(data, cluster_attr, ...)	Using MWU test to detect if any cluster is an outlier regarding one of the qc attributes: n_genes, n_counts, percent_mito.
`find_markers`(data, label_attr[, de_key, ...])	Find markers using gradient boosting method.