Also see the release notes of PegasusIO.
1.5.0 March 9, 2022¶
Spatial data analysis:
pegasus.read_inputfunction to load 10x Visium data: set
pegasus.spatialfunction to generate spatial plot for 10x Visium data.
Add Spatial Analysis Tutorial in Tutorials.
Pseudobulk analysis: see summary
pegasus.pseudobulkfunction to generate pseudobulk matrix.
pegasus.deseq2function to perform pseudobulk differential expression (DE) analysis, which is a Python wrapper of DESeq2.
Requires rpy2 and the original DESeq2 R package installed.
pegasus.pseudo.volcanofunctions for processing pseudobulk DE results.
Add Pseudobulk Analysis Tutorial in Tutorials.
pegasus.fgseafunction to perform Gene Set Enrichment Analysis (GSEA) on DE results and plotting, which is a Python wrapper of fgsea.
Requires rpy2 and the original fgsea R package installed.
correct_batch, which implements the L/S adjustment batch correction method, is obsolete. We recommend using
run_harmonyinstead, which is also the default of
pegasus.highly_variable_featuresallows specify custom attribute key for batches (
batchoption), and thus remove
consider_batchoption. To select HVGs without considering batch effects, simply use the default, or equivalently use
l2for L2 (default),
ipfor inner product, and
cosinefor cosine similarity.
The kNN graph returned by
pegasus.neighborsfunction is now stored in
obsmfield of the data object, no longer in
unsfield. Moreover, the kNN affinity matrix is stored in
pegasus.write_outputfunction to work with Zarr v2.11.0+.
1.4.5 January 24, 2022¶
Make several dependencies optional to meet with different use cases.
Adjust to umap-learn v0.5.2 interface.
1.4.4 October 22, 2021¶
Use PegasusIO v0.4.0+ for data manipulation.
calculate_z_scorefunction to calculate standardized z-scored count matrix.
1.4.3 July 25, 2021¶
run_harmonyfunction to use GPU for computation.
1.4.2 July 19, 2021¶
Bug fix for
1.4.1 July 17, 2021¶
Add NMF-related options to
Add word cloud graph plotting feature to
pegasus aggregate_matrixcommand now allow sample-specific filtration with parameters set in the input CSV-format sample sheet.
Update doublet detection method:
1.4.0 June 24, 2021¶
Change the parameter defaults of function
qc_metrics: Now all defaults are
None, meaning not performing any filtration on cell barcodes.
In Annotate Clusters API functions:
Improve human immune cell markers and auto cell type assignment for human immune cells. (
Update mouse brain cell markers (
annotatefunction now adds annotation as a categorical variable and sort categories in natural order.
find_outlier_clustersfunction to detect if any cluster is an outlier regarding one of the qc attributes (n_genes, n_counts, percent_mito) using MWU test.
In Plotting API functions:
scatterfunction now plots all cells if
fix_cornersoption to fix the four corners when only a subset of cells is shown.
Fix a bug in
Fix bugs in functions
Support umap-learn v0.5+. (
Update doublet detection algorithm. (
Improve message reporting execution time spent each step. (Pegasus command line tool)
1.3.0 February 2, 2021¶
Make PCA more reproducible. No need to keep options for robust PCA calculation:
pcafunction, remove argument
infer_doubletsfunction, remove argument
In pegasus cluster command, remove option
Add control on number of parallel threads for OpenMP/BLAS.
Now n_jobs = -1 refers to use all physical CPU cores instead of logcal CPU cores.
reduce_diffmap_to_3d. In pegasus cluster command, remove option
Enhance compo_plot and dotplot functions’ usability.
1.2.0 December 25, 2020¶
Add multimodal support on RNA and CITE-Seq data back:
--citeseq-umap-excludein pegasus cluster command.
Add automated doublet cutoff inference to
infer_doubletsfunction in API. ([Li20-2])
Expose doublet detection to command-line tool:
--dbl-cluster-attrin pegasus cluster command.
Add doublet detection tutorial.
Allow multiple marker files used in cell type annotation:
annotatefunction in API;
--markersoption in pegasus annotate_cluster command.
Rename pc_regress_out function in API to
Update the regress out tutorial.
1.1.0 December 7, 2020¶
Improve doublet detection in Scrublet-like way using automatic threshold selection strategy:
mark_doublets. Remove Scrublet from dependency, and remove
Enhance performance of log-normalization (
log_norm) and signature score calculation (
pegasus clustercommand, add
--genomeoption to specify reference genome name for input data of
Update Regress out tutorial.
Improve plotting functions:
1.0.0 September 22, 2020¶
zarrfile format to handle data, which has a better I/O performance in general.
Data are manipulated in Multi-modal structure in memory.
Support focus analysis on Unimodal data, and appending other Unimodal data to it. (
Calculate signature / gene module scores. (calc_signature_score)
Principal-Component-level regress out. (pc_regress_out)
Allow DE analysis with sample attribute as condition. (Set
conditionargument in de_analysis)
Use static plots to show results (see Plotting):
Provide static plots: composition plot, embedding plot (e.g. tSNE, UMAP, FLE, etc.), dot plot, feature plot, and volcano plot;
Add more gene-specific plots: dendrogram, heatmap, violin plot, quality-control violin, HVF plot.
No longer support
h5scfile format, which was the output format of
aggregate_matrixcommand in Pegasus version
In cell quality-control, default percent of mitochondrial genes is changed from 10.0 to 20.0. (
percent_mitoargument in qc_metrics;
Move gene quality-control out of
filter_datafunction to be a separate step. (identify_robust_genes)
DE analysis now uses MWU test by default, not t test. (de_analysis)
infer_cell_types uses MWU test as the default
Speed up MWU test in DE analysis, which is inspired by Presto.
Integrate Fisher’s exact test via Cython in DE analysis to improve speed.
0.17.2 June 26, 2020¶
Make Pegasus compatible with umap-learn v0.4+.
Use louvain 0.7+ for Louvain clustering.
0.17.1 April 6, 2020¶
Improve pegasus command-line tool log.
Add human lung markers.
Improve log-normalization speed.
Provide robust version of PCA calculation as an option.
Add signature score calculation API.
0.17.0 March 10, 2020¶
Support anndata 0.7 and pandas 1.0.
loomformat output writing function.
Bug fix on
mtxformat output writing function.
Update human immune cell markers.
0.16.11 February 28, 2020¶
--subset-singletsoptions to ‘cluster’ command.
loomfile with user-specified batch key and black list.
0.16.9 February 17, 2020¶
h5ad file with user-specified batch key.
0.16.8 January 30, 2020¶
Allow input annotated
0.16.7 January 28, 2020¶
mtx files of more filename formats.
0.16.5 January 23, 2020¶
Add Harmony algorithm for data integration.
0.16.3 December 17, 2019¶
Add support for loading mtx files generated from BUStools.
0.16.2 December 8, 2019¶
Fix bug in ‘subcluster’ command.
0.16.1 December 4, 2019¶
Fix one bug in clustering pipeline.
0.16.0 December 3, 2019¶
Change options in ‘aggregate_matrix’ command: remove ‘–google-cloud’, add ‘–default-reference’.
Fix bug in ‘–annotation’ option of ‘annotate_cluster’ command.
Fix bug in ‘net_fle’ function with 3-dimension coordinates.
Use fisher package version 0.1.9 or above, as modifications in our forked fisher-modified package has been merged into it.
0.15.0 October 2, 2019¶
Rename package to PegasusPy, with module name pegasus.
0.14.0 September 17, 2019¶
Provide Python API for interactive analysis.
0.10.0 January 31, 2019¶
Added ‘find_markers’ command to find markers using LightGBM.
Improved file loading speed and enabled the parsing of channels from barcode strings for cellranger aggregated h5 files.
0.9.0 January 17, 2019¶
In ‘cluster’ command, changed ‘–output-seurat-compatible’ to ‘–make-output-seurat-compatible’. Do not generate output_name.seurat.h5ad. Instead, output_name.h5ad should be able to convert to a Seurat object directly. In the seurat object, raw.data slot refers to the filtered count data, data slot refers to the log-normalized expression data, and scale.data refers to the variable-gene-selected, scaled data.
In ‘cluster’ command, added ‘–min-umis’ and ‘–max-umis’ options to filter cells based on UMI counts.
In ‘cluster’ command, ‘–output-filtration-results’ option does not require a spreadsheet name anymore. In addition, added more statistics such as median number of genes per cell in the spreadsheet.
In ‘cluster’ command, added ‘–plot-filtration-results’ and ‘–plot-filtration-figsize’ to support plotting filtration results. Improved documentation on ‘cluster command’ outputs.
Added ‘parquet’ command to transfer h5ad file into a parquet file for web-based interactive visualization.
0.8.0 November 26, 2018¶
Added support for checking index collision for CITE-Seq/hashing experiments.
0.7.0 October 26, 2018¶
Added support for CITE-Seq analysis.
0.6.0 October 23, 2018¶
Renamed scrtools to scCloud.
Added demuxEM module for cell/nuclei-hashing.
0.5.0 August 21, 2018¶
Fixed a problem related AnnData.
Added support for BigQuery.
0.4.0 August 2, 2018¶
Added mouse brain markers.
Allow aggregate matrix to take ‘Sample’ as attribute.
0.3.0 June 26, 2018¶
scrtools supports fast preprocessing, batch-correction, dimension reduction, graph-based clustering, diffusion maps, force-directed layouts, and differential expression analysis, annotate clusters, and plottings.