API

Pegasus can also be used as a python package. Import pegasus by:

import pegasus as pg

Read and Write

read_input(input_file[, file_type, mode, ...])

Load data into memory.

write_output(data, output_file[, file_type, ...])

Write data back to disk.

aggregate_matrices(csv_file[, restrictions, ...])

Aggregate channel-specific count matrices into one big count matrix.

Analysis Tools

Preprocess

qc_metrics(data[, select_singlets, ...])

Generate Quality Control (QC) metrics regarding cell barcodes on the dataset.

get_filter_stats(data[, min_genes_before_filt])

Calculate filtration stats on cell barcodes.

filter_data(data[, focus_list])

Filter data based on qc_metrics calculated in pg.qc_metrics.

identify_robust_genes(data[, percent_cells])

Identify robust genes as candidates for HVG selection and remove genes that are not expressed in any cells.

log_norm(data[, norm_count, base_matrix, ...])

Normalize each cell by total counts, and then apply natural logarithm to the data.

log1p(data[, base_matrix, target_matrix, select])

Apply the Log(1+x) transformation on the given count matrix.

normalize(data[, norm_count, base_matrix, ...])

Normalize each cell by total count.

arcsinh(data[, cofactor, jitter, ...])

Conduct arcsinh transform on the current matrix.

highly_variable_features(data[, batch, ...])

Highly variable features (HVF) selection.

select_features(data[, features, ...])

Subset the features and store the resulting matrix in dense format in data.uns with '_tmp_fmat_' prefix, with the option of standardization and truncating based on max_value.

pca(data[, n_components, features, ...])

Perform Principle Component Analysis (PCA) to the data.

nmf(data[, n_components, features, space, ...])

Perform Nonnegative Matrix Factorization (NMF) to the data using Frobenius norm.

regress_out(data, attrs[, rep])

Regress out effects due to specific observational attributes.

calculate_z_score(data[, n_bins])

Calculate the standardized z scores of the count matrix.

Batch Correction

run_harmony(data[, batch, rep, n_comps, ...])

Batch correction on PCs using Harmony.

run_scanorama(data[, batch, n_components, ...])

Batch correction using Scanorama.

integrative_nmf(data[, batch, n_components, ...])

Perform Integrative Nonnegative Matrix Factorization (iNMF) [Yang16] for data integration.

run_scvi(data[, features, matkey, n_jobs, ...])

Run scVI embedding.

Nearest Neighbors

neighbors(data[, K, rep, n_comps, n_jobs, ...])

Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.

get_neighbors(data[, K, rep, n_comps, ...])

Find K nearest neighbors for each data point and return the indices and distances arrays.

calc_kBET(data, attr[, rep, K, alpha, ...])

Calculate the kBET metric of the data regarding a specific sample attribute and embedding.

calc_kSIM(data, attr[, rep, K, min_rate, ...])

Calculate the kSIM metric of the data regarding a specific sample attribute and embedding.

Diffusion Map

diffmap(data[, n_components, rep, solver, ...])

Calculate Diffusion Map.

calc_pseudotime(data, roots)

Calculate Pseudotime based on Diffusion Map.

infer_path(data, cluster, clust_id, path_name)

Inference on path of a cluster.

Cluster Algorithms

cluster(data[, algo, rep, resolution, ...])

Cluster the data using the chosen algorithm.

louvain(data[, rep, resolution, n_clust, ...])

Cluster the cells using Louvain algorithm.

leiden(data[, rep, resolution, n_clust, ...])

Cluster the data using Leiden algorithm.

split_one_cluster(data, clust_label, ...[, ...])

Use Leiden algorithm to split 'clust_id' in 'clust_label' into 'n_components' sub-clusters and write the new clusting results to 'res_label'.

spectral_louvain(data[, rep, resolution, ...])

Cluster the data using Spectral Louvain algorithm.

spectral_leiden(data[, rep, resolution, ...])

Cluster the data using Spectral Leiden algorithm.

Visualization Algorithms

tsne(data[, rep, rep_ncomps, n_jobs, ...])

Calculate t-SNE embedding of cells using the FIt-SNE package.

umap(data[, rep, rep_ncomps, n_components, ...])

Calculate UMAP embedding of cells.

fle(data[, file_name, n_jobs, rep, ...])

Construct the Force-directed (FLE) graph.

net_umap(data[, rep, n_jobs, n_components, ...])

Calculate Net-UMAP embedding of cells.

net_fle(data[, file_name, n_jobs, rep, K, ...])

Construct Net-Force-directed (FLE) graph.

Doublet Detection

infer_doublets(data[, channel_attr, ...])

Infer doublets by first calculating Scrublet-like [Wolock18] doublet scores and then smartly determining an appropriate doublet score cutoff [Li20-2] .

mark_doublets(data[, demux_attr, dbl_clusts])

Convert doublet prediction into doublet annotations that Pegasus can recognize.

Gene Module Score

calc_signature_score(data, signatures[, ...])

Calculate signature / gene module score.

Label Transfer

train_scarches_scanvi(data, dir_path, label)

Run scArches training.

predict_scarches_scanvi(data, dir_path, label)

Run scArches training.

Differential Expression and Gene Set Enrichment Analysis

de_analysis(data, cluster[, condition, ...])

Perform Differential Expression (DE) Analysis on data.

markers(data[, head, de_key, alpha])

Extract DE results into a human readable structure.

write_results_to_excel(results, output_file)

Write DE analysis results into Excel workbook.

fgsea(data, log2fc_key, pathways[, de_key, ...])

Perform Gene Set Enrichment Analysis using fGSEA.

write_fgsea_results_to_excel(data, output_file)

Write Gene Set Enrichment Analysis (GSEA) results generated by fgsea function into Excel workbook.

Annotate clusters

infer_cell_types(data, markers[, de_test, ...])

Infer putative cell types for each cluster using legacy markers.

annotate(data, name, based_on, anno_dict)

Add annotation to the data object as a categorical variable.

Plotting

scatter(data[, attrs, basis, components, ...])

Generate scatter plots for different attributes

scatter_groups(data, attr, groupby[, basis, ...])

Generate scatter plots of attribute 'attr' for each category in attribute 'group'.

spatial(data[, attrs, basis, resolution, ...])

Scatter plot on spatial coordinates.

compo_plot(data, groupby, condition[, ...])

Generate a composition plot, which shows the percentage of cells from each condition for every cluster.

violin(data, attrs, groupby[, hue, matkey, ...])

Generate a stacked violin plot.

heatmap(data, attrs[, groupby, matkey, ...])

Generate a heatmap.

dotplot(data, genes, groupby[, ...])

Generate a dot plot.

dendrogram(data, groupby[, rep, genes, ...])

Generate a dendrogram on hierarchical clustering result.

hvfplot(data[, top_n, panel_size, ...])

Generate highly variable feature plot.

qcviolin(data, plot_type[, ...])

Plot quality control statistics (before filtration vs.

volcano(data, cluster_id[, de_key, de_test, ...])

Generate Volcano plots (-log10 p value vs.

rank_plot(data[, panel_size, return_fig, dpi])

Generate a barcode rank plot, which shows the total UMIs against barcode rank (in descending order with respect to total UMIs)

ridgeplot(data, features[, matrix_key, ...])

Generate ridge plots, up to 8 features can be shown in one figure.

wordcloud(data, factor[, max_words, ...])

Generate one word cloud image for factor (starts from 0) in data.uns['W'].

plot_gsea(data[, gsea_keyword, alpha, ...])

Generate GSEA barplots

elbowplot(data[, rep, pval, panel_size, ...])

Generate Elbowplot and suggest n_comps to select based on random matrix theory (see utils.largest_variance_from_random_matrix).

Pseudo-bulk analysis

pseudobulk(data, sample[, attrs, mat_key, ...])

Generate Pseudo-bulk count matrices.

deseq2(pseudobulk, design, contrast[, ...])

Perform Differential Expression (DE) Analysis using DESeq2 on pseduobulk data.

pseudo.markers(pseudobulk[, head, de_key, alpha])

Extract pseudobulk DE results into a human readable structure.

pseudo.write_results_to_excel(results, ...)

Write pseudo-bulk DE analysis results into a Excel workbook.

pseudo.volcano(pseudobulk[, de_key, ...])

Generate Volcano plots (-log10 p value vs.

Demultiplexing

estimate_background_probs(hashing_data[, ...])

For cell-hashing data, estimate antibody background probability using KMeans algorithm.

demultiplex(rna_data, hashing_data[, ...])

Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in demuxEM.estimate_background_probs.

attach_demux_results(input_rna_file, rna_data)

Write demultiplexing results into raw gene expression matrix.

Miscellaneous

search_genes(data, gene_list[, rec_key, measure])

Extract and display gene expressions for each cluster.

search_de_genes(data, gene_list[, rec_key, ...])

Extract and display differential expression analysis results of markers for each cluster.

find_outlier_clusters(data, cluster_attr, ...)

Using MWU test to detect if any cluster is an outlier regarding one of the qc attributes: n_genes, n_counts, percent_mito.

find_markers(data, label_attr[, de_key, ...])

Find markers using gradient boosting method.