API

Pegasus can also be used as a python package. Import pegasus by:

import pegasus as pg

Analysis Tools

Read and Write

read_input(input_file[, file_type, mode, …])

Load data into memory.

write_output(data, output_file[, file_type, …])

Write data back to disk.

aggregate_matrices(csv_file[, restrictions, …])

Aggregate channel-specific count matrices into one big count matrix.

Preprocess

qc_metrics(data[, select_singlets, …])

Generate Quality Control (QC) metrics regarding cell barcodes on the dataset.

get_filter_stats(data[, min_genes_before_filt])

Calculate filtration stats on cell barcodes.

filter_data(data[, focus_list])

Filter data based on qc_metrics calculated in pg.qc_metrics.

identify_robust_genes(data[, percent_cells])

Identify robust genes as candidates for HVG selection and remove genes that are not expressed in any cells.

log_norm(data[, norm_count, backup_matrix])

Normalization, and then apply natural logarithm to the data.

highly_variable_features(data, consider_batch)

Highly variable features (HVF) selection.

select_features(data[, features, …])

Subset the features and store the resulting matrix in dense format in data.uns with ‘fmat_’ prefix, with the option of standardization and truncating based on max_value.

pca(data[, n_components, features, …])

Perform Principle Component Analysis (PCA) to the data.

pc_regress_out(data, attrs[, rep])

Regress out effects due to specific observational attributes at Principal Component level.

Batch Correction

set_group_attribute(data, attribute_string)

Set group attributes used in batch correction.

correct_batch(data[, features])

Batch correction on data using Location-Scale (L/S) Adjustment method.

run_harmony(data[, rep, n_jobs, n_clusters, …])

Batch correction on PCs using Harmony.

run_scanorama(data[, n_components, …])

Batch correction using Scanorama.

Nearest Neighbors

neighbors(data[, K, rep, n_jobs, …])

Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.

calc_kBET(data, attr[, rep, K, alpha, …])

Calculate the kBET metric of the data regarding a specific sample attribute and embedding.

calc_kSIM(data, attr[, rep, K, min_rate, …])

Calculate the kSIM metric of the data regarding a specific sample attribute and embedding.

Diffusion Map

diffmap(data[, n_components, rep, solver, …])

Calculate Diffusion Map.

reduce_diffmap_to_3d(data[, random_state])

Reduce high-dimensional Diffusion Map matrix to 3-dimentional.

calc_pseudotime(data, roots)

Calculate Pseudotime based on Diffusion Map.

infer_path(data, cluster, clust_id, path_name)

Inference on path of a cluster.

Cluster algorithms

cluster(data[, algo, rep, resolution, …])

Cluster the data using the chosen algorithm.

louvain(data[, rep, resolution, …])

Cluster the cells using Louvain algorithm.

leiden(data[, rep, resolution, n_iter, …])

Cluster the data using Leiden algorithm.

spectral_louvain(data[, rep, resolution, …])

Cluster the data using Spectral Louvain algorithm.

spectral_leiden(data[, rep, resolution, …])

Cluster the data using Spectral Leiden algorithm.

Visualization Algorithms

tsne(data[, rep, n_jobs, n_components, …])

Calculate tSNE embedding of cells.

fitsne(data[, rep, n_jobs, n_components, …])

Calculate FIt-SNE embedding of cells.

umap(data[, rep, n_components, n_neighbors, …])

Calculate UMAP embedding of cells.

fle(data[, file_name, n_jobs, rep, K, …])

Construct the Force-directed (FLE) graph.

net_tsne(data[, rep, n_jobs, n_components, …])

Calculate Net-tSNE embedding of cells.

net_umap(data[, rep, n_jobs, n_components, …])

Calculate Net-UMAP embedding of cells.

net_fle(data[, file_name, n_jobs, rep, K, …])

Construct Net-Force-directed (FLE) graph.

Differential Expression Analysis

de_analysis(data, cluster[, condition, …])

Perform Differential Expression (DE) Analysis on data.

markers(data[, head, de_key, alpha])

Extract DE results into a human readable structure.

write_results_to_excel(results, output_file)

Write DE analysis results into Excel workbook.

Marker Detection based on Gradient Boost Machine

find_markers(data, label_attr[, de_key, …])

Find markers using gradient boosting method.

Annotate clusters:

infer_cell_types(data, markers[, de_test, …])

Infer putative cell types for each cluster using legacy markers.

annotate(data, name, based_on, anno_dict)

Add annotation to AnnData obj.

Plotting

scatter(data, attrs[, basis, matkey, …])

Generate scatter plots for different attributes

scatter_groups(data, attr, groupby[, basis, …])

Generate scatter plots of attribute ‘attr’ for each category in attribute ‘group’.

compo_plot(data, groupby, condition[, …])

Generate a composition plot, which shows the percentage of cells from each condition for every cluster.

violin(data, attrs, groupby[, hue, matkey, …])

Generate a stacked violin plot.

heatmap(data, genes, groupby[, matkey, …])

Generate a heatmap.

dotplot(data, genes, groupby[, …])

Generate a dot plot.

dendrogram(data, groupby[, rep, genes, …])

Generate a dendrogram on hierarchical clustering result.

hvfplot(data[, top_n, panel_size, …])

Generate highly variable feature plot.

qcviolin(data, plot_type[, …])

Plot quality control statistics (before filtration vs.

volcano(data, cluster_id[, de_key, de_test, …])

Generate Volcano plots (-log10 p value vs.

Demultiplexing

estimate_background_probs(hashing_data[, …])

For cell-hashing data, estimate antibody background probability using KMeans algorithm.

demultiplex(rna_data, hashing_data[, …])

Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in demuxEM.estimate_background_probs.

attach_demux_results(input_rna_file, rna_data)

Write demultiplexing results into raw gene expression matrix.

Doublet Detection

run_scrublet(data[, channel_attr, …])

Calculate doublet scores using Scrublet for each channel on the current associated data.X matrix.

infer_doublets(data[, dbl_attr, …])

Infer doublets based on Scrublet scores.

mark_doublets(data[, demux_attr, dbl_clusts])

Convert doublet prediction into doublet annotations that Pegasus can recognize.

Gene Module Score

calc_signature_score(data, signatures[, …])

Calculate signature / gene module score.

Miscellaneous

search_genes(data, gene_list[, rec_key, measure])

Extract and display gene expressions for each cluster from an anndata object.

search_de_genes(data, gene_list[, rec_key, …])

Extract and display differential expression analysis results of markers for each cluster.