API¶

Pegasus can also be used as a python package. Import pegasus by:

import pegasus as pg

Read and Write¶

`read_input`(input_file[, file_type, mode, …])	Load data into memory.
`write_output`(data, output_file[, file_type, …])	Write data back to disk.
`aggregate_matrices`(csv_file[, restrictions, …])	Aggregate channel-specific count matrices into one big count matrix.

`qc_metrics`(data[, select_singlets, …])	Generate Quality Control (QC) metrics regarding cell barcodes on the dataset.
`get_filter_stats`(data[, min_genes_before_filt])	Calculate filtration stats on cell barcodes.
`filter_data`(data[, focus_list])	Filter data based on qc_metrics calculated in `pg.qc_metrics`.
`identify_robust_genes`(data[, percent_cells])	Identify robust genes as candidates for HVG selection and remove genes that are not expressed in any cells.
`log_norm`(data[, norm_count, backup_matrix])	Normalization, and then apply natural logarithm to the data.
`highly_variable_features`(data[, …])	Highly variable features (HVF) selection.
`select_features`(data[, features, …])	Subset the features and store the resulting matrix in dense format in data.uns with ‘_tmp_fmat_’ prefix, with the option of standardization and truncating based on max_value.
`pca`(data[, n_components, features, …])	Perform Principle Component Analysis (PCA) to the data.
`nmf`(data[, n_components, features, space, …])	Perform Nonnegative Matrix Factorization (NMF) to the data using Frobenius norm.
`regress_out`(data, attrs[, rep])	Regress out effects due to specific observational attributes.

`set_group_attribute`(data, attribute_string)	Set group attributes used in batch correction.
`correct_batch`(data[, features])	Batch correction on data using Location-Scale (L/S) Adjustment method.
`run_harmony`(data[, batch, rep, n_jobs, …])	Batch correction on PCs using Harmony.
`run_scanorama`(data[, batch, n_components, …])	Batch correction using Scanorama.
`integrative_nmf`(data[, batch, n_components, …])	Perform Integrative Nonnegative Matrix Factorization (iNMF) [Yang16] for data integration.

`neighbors`(data[, K, rep, n_jobs, …])	Compute k nearest neighbors and affinity matrix, which will be used for diffmap and graph-based community detection algorithms.
`get_neighbors`(data[, K, rep, n_jobs, …])	Find K nearest neighbors for each data point and return the indices and distances arrays.
`calc_kBET`(data, attr[, rep, K, alpha, …])	Calculate the kBET metric of the data regarding a specific sample attribute and embedding.
`calc_kSIM`(data, attr[, rep, K, min_rate, …])	Calculate the kSIM metric of the data regarding a specific sample attribute and embedding.

`diffmap`(data[, n_components, rep, solver, …])	Calculate Diffusion Map.
`calc_pseudotime`(data, roots)	Calculate Pseudotime based on Diffusion Map.
`infer_path`(data, cluster, clust_id, path_name)	Inference on path of a cluster.

`cluster`(data[, algo, rep, resolution, …])	Cluster the data using the chosen algorithm.
`louvain`(data[, rep, resolution, n_clust, …])	Cluster the cells using Louvain algorithm.
`leiden`(data[, rep, resolution, n_clust, …])	Cluster the data using Leiden algorithm.
`spectral_louvain`(data[, rep, resolution, …])	Cluster the data using Spectral Louvain algorithm.
`spectral_leiden`(data[, rep, resolution, …])	Cluster the data using Spectral Leiden algorithm.

`tsne`(data[, rep, n_jobs, n_components, …])	Calculate t-SNE embedding of cells using the FIt-SNE package.
`umap`(data[, rep, n_components, n_neighbors, …])	Calculate UMAP embedding of cells.
`fle`(data[, file_name, n_jobs, rep, K, …])	Construct the Force-directed (FLE) graph.
`net_umap`(data[, rep, n_jobs, n_components, …])	Calculate Net-UMAP embedding of cells.
`net_fle`(data[, file_name, n_jobs, rep, K, …])	Construct Net-Force-directed (FLE) graph.

`infer_doublets`(data[, channel_attr, …])	Infer doublets by first calculating Scrublet-like [Wolock18] doublet scores and then smartly determining an appropriate doublet score cutoff [Li20-2] .
`mark_doublets`(data[, demux_attr, dbl_clusts])	Convert doublet prediction into doublet annotations that Pegasus can recognize.

calc_signature_score(data, signatures[, …])

Calculate signature / gene module score.

`de_analysis`(data, cluster[, condition, …])	Perform Differential Expression (DE) Analysis on data.
`markers`(data[, head, de_key, alpha])	Extract DE results into a human readable structure.
`write_results_to_excel`(results, output_file)	Write DE analysis results into Excel workbook.

`infer_cell_types`(data, markers[, de_test, …])	Infer putative cell types for each cluster using legacy markers.
`annotate`(data, name, based_on, anno_dict)	Add annotation to the data object as a categorical variable.

`scatter`(data[, attrs, basis, matkey, …])	Generate scatter plots for different attributes
`scatter_groups`(data, attr, groupby[, basis, …])	Generate scatter plots of attribute ‘attr’ for each category in attribute ‘group’.
`compo_plot`(data, groupby, condition[, …])	Generate a composition plot, which shows the percentage of cells from each condition for every cluster.
`violin`(data, attrs, groupby[, hue, matkey, …])	Generate a stacked violin plot.
`heatmap`(data, attrs, groupby[, matkey, …])	Generate a heatmap.
`dotplot`(data, genes, groupby[, …])	Generate a dot plot.
`dendrogram`(data, groupby[, rep, genes, …])	Generate a dendrogram on hierarchical clustering result.
`hvfplot`(data[, top_n, panel_size, …])	Generate highly variable feature plot.
`qcviolin`(data, plot_type[, …])	Plot quality control statistics (before filtration vs.
`ridgeplot`(data, features[, donor_attr, …])	Generate ridge plots, up to 8 features can be shown in one figure.
`volcano`(data, cluster_id[, de_key, de_test, …])	Generate Volcano plots (-log10 p value vs.

`estimate_background_probs`(hashing_data[, …])	For cell-hashing data, estimate antibody background probability using KMeans algorithm.
`demultiplex`(rna_data, hashing_data[, …])	Demultiplexing cell/nucleus-hashing data, using the estimated antibody background probability calculated in `demuxEM.estimate_background_probs`.
`attach_demux_results`(input_rna_file, rna_data)	Write demultiplexing results into raw gene expression matrix.

`search_genes`(data, gene_list[, rec_key, measure])	Extract and display gene expressions for each cluster.
`search_de_genes`(data, gene_list[, rec_key, …])	Extract and display differential expression analysis results of markers for each cluster.
`find_outlier_clusters`(data, cluster_attr, …)	Using MWU test to detect if any cluster is an outlier regarding one of the qc attributes: n_genes, n_counts, percent_mito.
`find_markers`(data, label_attr[, de_key, …])	Find markers using gradient boosting method.