API

Pegasus can also be used as a python package. Import pegasus by:

import pegasus as pg

Analysis Tools

Read and Write

read_input(input_file[, genome, …])

Load data into memory.

write_output(data, output_file[, whitelist])

Write data back to disk.

aggregate_matrices(csv_file[, …])

Aggregate channel-specific count matrices into one big count matrix.

Preprocess

qc_metrics(data[, mito_prefix, min_genes, …])

Generate Quality Control (QC) metrics on the dataset.

get_filter_stats(data)

Calculate filtration stats on cell barcodes and genes, respectively.

filter_data(data)

Filter data based on qc_metrics calculated in pg.qc_metrics.

log_norm(data[, norm_count])

Normalization, and then apply natural logarithm to the data.

highly_variable_features(data, consider_batch)

Highly variable features (HVF) selection.

select_features(data[, features])

Subset the features and store the resulting matrix in dense format in data.uns with ‘fmat_’ prefix.

pca(data[, n_components, features, …])

Perform Principle Component Analysis (PCA) to the data.

Batch Correction

set_group_attribute(data, attribute_string)

Set group attributes used in batch correction.

correct_batch(data[, features])

Batch correction on data.

run_harmony(data[, rep, n_jobs, n_clusters, …])

Batch correction PCs using Harmony

Nearest Neighbors

neighbors(data[, K, rep, n_jobs, …])

Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.

calc_kBET(data, attr[, rep, K, alpha, …])

Calculate the kBET metric of the data w.r.t.

calc_kSIM(data, attr[, rep, K, min_rate, …])

Calculate the kSIM metric of the data w.r.t.

Diffusion Map

diffmap(data[, n_components, rep, solver, …])

Calculate Diffusion Map.

reduce_diffmap_to_3d(data[, random_state])

Reduce high-dimensional Diffusion Map matrix to 3-dimentional.

calc_pseudotime(data, roots)

Calculate Pseudotime based on Diffusion Map.

infer_path(data, cluster, clust_id, path_name)

Inference on path of a cluster.

Cluster algorithms

cluster(data[, algo, rep, resolution, …])

Cluster the data using the chosen algorithm.

louvain(data[, rep, resolution, …])

Cluster the cells using Louvain algorithm.

leiden(data[, rep, resolution, n_iter, …])

Cluster the data using Leiden algorithm.

spectral_louvain(data[, rep, resolution, …])

Cluster the data using Spectral Louvain algorithm.

spectral_leiden(data[, rep, resolution, …])

Cluster the data using Spectral Leiden algorithm.

Visualization Algorithms

tsne(data[, rep, n_jobs, n_components, …])

Calculate tSNE embedding using MulticoreTSNE_ package.

fitsne(data[, rep, n_jobs, n_components, …])

Calculate FIt-SNE embedding using fitsne_ package.

umap(data[, rep, n_components, n_neighbors, …])

Calculate UMAP embedding using umap-learn_ package.

fle(data[, file_name, n_jobs, rep, K, …])

Construct the Force-directed (FLE) graph using ForceAtlas2_ implementation, with Python wrapper as forceatlas2-python_.

net_tsne(data[, rep, n_jobs, n_components, …])

Calculate approximated tSNE embedding using Deep Learning model to improve the speed.

net_fitsne(data[, rep, n_jobs, …])

Calculate approximated FI-tSNE embedding using Deep Learning model to improve the speed.

net_umap(data[, rep, n_jobs, n_components, …])

Calculate approximated UMAP embedding using Deep Learning model to improve the speed.

net_fle(data[, file_name, n_jobs, rep, K, …])

Construct the approximated Force-directed (FLE) graph using Deep Learning model to improve the speed.

Differential Expression Analysis

de_analysis(data, cluster[, condition, …])

Perform Differential Expression (DE) Analysis on data.

markers(data[, head, de_key, sort_by, alpha])

type data

AnnData

write_results_to_excel(results, output_file)

Write results into Excel workbook.

Marker Detection based on Gradient Boost Machine

find_markers(data, label_attr[, de_key, …])

Find markers using gradient boosting method.

Annotate clusters:

infer_cell_types(data, markers, de_test[, …])

Infer putative cell types for each cluster using legacy markers.

annotate(data, name, based_on, anno_dict)

Add annotation to AnnData obj.

Plotting

Interactive Plots

embedding(adata, basis[, keys, cmap, …])

Generate an embedding plot.

composition_plot(adata, by, condition[, …])

Generate a composition plot, which shows the percentage of observations from every condition within each cluster (by).

variable_feature_plot(adata, **kwds)

Generate a variable feature plot.

heatmap(adata, keys, by[, reduce_function, …])

Generate a heatmap.

dotplot(adata, keys, by[, reduce_function, …])

Generate a dot plot.

Quality Control Plots

violin(adata, keys[, by, width, cmap, cols, …])

Generate a violin plot.

scatter(adata, x, y[, color, size, dot_min, …])

Generate a scatter plot.

scatter_matrix(adata, keys[, color, use_raw])

Generate a scatter plot matrix.

Demultiplexing

estimate_background_probs(adt[, random_state])

For cell-hashing data, estimate antibody background probability using EM algorithm.

demultiplex(data, adt[, min_signal, alpha, …])

Demultiplexing cell-hashing data, using the estimated antibody background probability calculated in pg.estimate_background_probs.

Miscellaneous

calc_signature_score(data, signatures[, n_bins])

Calculate signature / gene module score.

search_genes(data, gene_list[, rec_key, measure])

Extract and display gene expressions for each cluster from an anndata object.

search_de_genes(data, gene_list[, rec_key, …])

Extract and display differential expression analysis results of markers for each cluster.