Release Notes

Note

Also see the release notes of PegasusIO.

Version 1.5

1.5.0 March 9, 2022

New Features

  • Spatial data analysis:

    • Enable pegasus.read_input function to load 10x Visium data: set file_type="visium" option.

    • Add pegasus.spatial function to generate spatial plot for 10x Visium data.

    • Add Spatial Analysis Tutorial in Tutorials.

  • Pseudobulk analysis: see summary

    • Add pegasus.pseudobulk function to generate pseudobulk matrix.

    • Add pegasus.deseq2 function to perform pseudobulk differential expression (DE) analysis, which is a Python wrapper of DESeq2.

      • Requires rpy2 and the original DESeq2 R package installed.

    • Add pegasus.pseudo.markers, pegasus.pseudo.write_results_to_excel and pegasus.pseudo.volcano functions for processing pseudobulk DE results.

    • Add Pseudobulk Analysis Tutorial in Tutorials.

  • Add pegasus.fgsea function to perform Gene Set Enrichment Analysis (GSEA) on DE results and plotting, which is a Python wrapper of fgsea.

    • Requires rpy2 and the original fgsea R package installed.

API Changes

  • Function correct_batch, which implements the L/S adjustment batch correction method, is obsolete. We recommend using run_harmony instead, which is also the default of --correct-batch-effect option in pegasus cluster command.

  • pegasus.highly_variable_features allows specify custom attribute key for batches (batch option), and thus remove consider_batch option. To select HVGs without considering batch effects, simply use the default, or equivalently use batch=None option.

  • Add dist option to pegasus.neighbors function to allow use distance other than L2. (Contribution by hoondy in PR 233)

    • Available options: l2 for L2 (default), ip for inner product, and cosine for cosine similarity.

  • The kNN graph returned by pegasus.neighbors function is now stored in obsm field of the data object, no longer in uns field. Moreover, the kNN affinity matrix is stored in obsp field.

Improvements

  • Adjust pegasus.write_output function to work with Zarr v2.11.0+.

Version 1.4

1.4.5 January 24, 2022

  • Make several dependencies optional to meet with different use cases.

  • Adjust to umap-learn v0.5.2 interface.

1.4.4 October 22, 2021

  • Use PegasusIO v0.4.0+ for data manipulation.

  • Add calculate_z_score function to calculate standardized z-scored count matrix.

1.4.3 July 25, 2021

  • Allow run_harmony function to use GPU for computation.

1.4.2 July 19, 2021

  • Bug fix for --output-h5ad and --citeseq options in pegasus cluster command.

1.4.1 July 17, 2021

  • Add NMF-related options to pegasus cluster command.

  • Add word cloud graph plotting feature to pegasus plot command.

  • pegasus aggregate_matrix command now allow sample-specific filtration with parameters set in the input CSV-format sample sheet.

  • Update doublet detection method: infer_doublets and mark_doublets functions.

  • Bug fix.

1.4.0 June 24, 2021

  • Add nmf and integrative_nmf functions to compute NMF and iNMF using nmf-torch package; integrative_nmf supports quantile normalization proposed in the LIGER papers ([Welch19], [Gao21]).

  • Change the parameter defaults of function qc_metrics: Now all defaults are None, meaning not performing any filtration on cell barcodes.

  • In Annotate Clusters API functions:

    • Improve human immune cell markers and auto cell type assignment for human immune cells. (infer_cell_types function)

    • Update mouse brain cell markers (infer_cell_types function)

    • annotate function now adds annotation as a categorical variable and sort categories in natural order.

  • Add find_outlier_clusters function to detect if any cluster is an outlier regarding one of the qc attributes (n_genes, n_counts, percent_mito) using MWU test.

  • In Plotting API functions:

    • scatter function now plots all cells if attrs == None; Add fix_corners option to fix the four corners when only a subset of cells is shown.

    • Fix a bug in heatmap plotting function.

  • Fix bugs in functions spectral_leiden and spectral_louvain.

  • Improvements:

    • Support umap-learn v0.5+. (umap and net_umap functions)

    • Update doublet detection algorithm. (infer_doublets function)

    • Improve message reporting execution time spent each step. (Pegasus command line tool)

Version 1.3

1.3.0 February 2, 2021

  • Make PCA more reproducible. No need to keep options for robust PCA calculation:

    • In pca function, remove argument robust.

    • In infer_doublets function, remove argument robust.

    • In pegasus cluster command, remove option --pca-robust.

  • Add control on number of parallel threads for OpenMP/BLAS.

    • Now n_jobs = -1 refers to use all physical CPU cores instead of logcal CPU cores.

  • Remove function reduce_diffmap_to_3d. In pegasus cluster command, remove option --diffmap-to-3d.

  • Enhance compo_plot and dotplot functions’ usability.

  • Bug fix.

Version 1.2

1.2.0 December 25, 2020

  • tSNE support:

    • tsne function in API: Use FIt-SNE for tSNE embedding calculation. No longer support MulticoreTSNE.

    • Determine learning_rate argument in tsne more dynamically. ([Belkina19], [Kobak19])

    • By default, use PCA embedding for initialization in tsne. ([Kobak19])

    • Remove net_tsne and fitsne functions from API.

    • Remove --net-tsne and --fitsne options from pegasus cluster command.

  • Add multimodal support on RNA and CITE-Seq data back: --citeseq, --citeseq-umap, and --citeseq-umap-exclude in pegasus cluster command.

  • Doublet detection:

    • Add automated doublet cutoff inference to infer_doublets function in API. ([Li20-2])

    • Expose doublet detection to command-line tool: --infer-doublets, --expected-doublet-rate, and --dbl-cluster-attr in pegasus cluster command.

    • Add doublet detection tutorial.

  • Allow multiple marker files used in cell type annotation: annotate function in API; --markers option in pegasus annotate_cluster command.

  • Rename pc_regress_out function in API to regress_out.

  • Update the regress out tutorial.

  • Bug fix.

Version 1.1

1.1.0 December 7, 2020

  • Improve doublet detection in Scrublet-like way using automatic threshold selection strategy: infer_doublets, and mark_doublets. Remove Scrublet from dependency, and remove run_scrublet function.

  • Enhance performance of log-normalization (log_norm) and signature score calculation (calc_signature_score).

  • In pegasus cluster command, add --genome option to specify reference genome name for input data of dge, csv, or loom format.

  • Update Regress out tutorial.

  • Add ridgeplot.

  • Improve plotting functions: heatmap, and dendrogram.

  • Bug fix.

Version 1.0

1.0.0 September 22, 2020

  • New features:

    • Use zarr file format to handle data, which has a better I/O performance in general.

    • Multi-modality support:

      • Data are manipulated in Multi-modal structure in memory.

      • Support focus analysis on Unimodal data, and appending other Unimodal data to it. (--focus and --append options in cluster command)

    • Calculate signature / gene module scores. (calc_signature_score)

    • Doublet detection based on Scrublet: run_scrublet, infer_doublets, and mark_doublets.

    • Principal-Component-level regress out. (pc_regress_out)

    • Batch correction using Scanorama. (run_scanorama)

    • Allow DE analysis with sample attribute as condition. (Set condition argument in de_analysis)

    • Use static plots to show results (see Plotting):

      • Provide static plots: composition plot, embedding plot (e.g. tSNE, UMAP, FLE, etc.), dot plot, feature plot, and volcano plot;

      • Add more gene-specific plots: dendrogram, heatmap, violin plot, quality-control violin, HVF plot.

  • Deprecations:

    • No longer support h5sc file format, which was the output format of aggregate_matrix command in Pegasus version 0.x.

    • Remove net_fitsne function.

  • API changes:

    • In cell quality-control, default percent of mitochondrial genes is changed from 10.0 to 20.0. (percent_mito argument in qc_metrics; --percent-mito option in cluster command)

    • Move gene quality-control out of filter_data function to be a separate step. (identify_robust_genes)

    • DE analysis now uses MWU test by default, not t test. (de_analysis)

    • infer_cell_types uses MWU test as the default de_test.

  • Performance improvement:

    • Speed up MWU test in DE analysis, which is inspired by Presto.

    • Integrate Fisher’s exact test via Cython in DE analysis to improve speed.

  • Other highlights:

Version 0.x

0.17.2 June 26, 2020

  • Make Pegasus compatible with umap-learn v0.4+.

  • Use louvain 0.7+ for Louvain clustering.

  • Update tutorial.

0.17.1 April 6, 2020

  • Improve pegasus command-line tool log.

  • Add human lung markers.

  • Improve log-normalization speed.

  • Provide robust version of PCA calculation as an option.

  • Add signature score calculation API.

  • Fix bugs.

0.17.0 March 10, 2020

  • Support anndata 0.7 and pandas 1.0.

  • Better loom format output writing function.

  • Bug fix on mtx format output writing function.

  • Update human immune cell markers.

  • Improve pegasus scp_output command.

0.16.11 February 28, 2020

  • Add --remap-singlets and --subset-singlets options to ‘cluster’ command.

  • Allow reading loom file with user-specified batch key and black list.

0.16.9 February 17, 2020

Allow reading h5ad file with user-specified batch key.

0.16.8 January 30, 2020

Allow input annotated loom file.

0.16.7 January 28, 2020

Allow input mtx files of more filename formats.

0.16.5 January 23, 2020

Add Harmony algorithm for data integration.

0.16.3 December 17, 2019

Add support for loading mtx files generated from BUStools.

0.16.2 December 8, 2019

Fix bug in ‘subcluster’ command.

0.16.1 December 4, 2019

Fix one bug in clustering pipeline.

0.16.0 December 3, 2019

  • Change options in ‘aggregate_matrix’ command: remove ‘–google-cloud’, add ‘–default-reference’.

  • Fix bug in ‘–annotation’ option of ‘annotate_cluster’ command.

  • Fix bug in ‘net_fle’ function with 3-dimension coordinates.

  • Use fisher package version 0.1.9 or above, as modifications in our forked fisher-modified package has been merged into it.

0.15.0 October 2, 2019

Rename package to PegasusPy, with module name pegasus.

0.14.0 September 17, 2019

Provide Python API for interactive analysis.

0.10.0 January 31, 2019

Added ‘find_markers’ command to find markers using LightGBM.

Improved file loading speed and enabled the parsing of channels from barcode strings for cellranger aggregated h5 files.

0.9.0 January 17, 2019

In ‘cluster’ command, changed ‘–output-seurat-compatible’ to ‘–make-output-seurat-compatible’. Do not generate output_name.seurat.h5ad. Instead, output_name.h5ad should be able to convert to a Seurat object directly. In the seurat object, raw.data slot refers to the filtered count data, data slot refers to the log-normalized expression data, and scale.data refers to the variable-gene-selected, scaled data.

In ‘cluster’ command, added ‘–min-umis’ and ‘–max-umis’ options to filter cells based on UMI counts.

In ‘cluster’ command, ‘–output-filtration-results’ option does not require a spreadsheet name anymore. In addition, added more statistics such as median number of genes per cell in the spreadsheet.

In ‘cluster’ command, added ‘–plot-filtration-results’ and ‘–plot-filtration-figsize’ to support plotting filtration results. Improved documentation on ‘cluster command’ outputs.

Added ‘parquet’ command to transfer h5ad file into a parquet file for web-based interactive visualization.

0.8.0 November 26, 2018

Added support for checking index collision for CITE-Seq/hashing experiments.

0.7.0 October 26, 2018

Added support for CITE-Seq analysis.

0.6.0 October 23, 2018

Renamed scrtools to scCloud.

Added demuxEM module for cell/nuclei-hashing.

0.5.0 August 21, 2018

Fixed a problem related AnnData.

Added support for BigQuery.

0.4.0 August 2, 2018

Added mouse brain markers.

Allow aggregate matrix to take ‘Sample’ as attribute.

0.3.0 June 26, 2018

scrtools supports fast preprocessing, batch-correction, dimension reduction, graph-based clustering, diffusion maps, force-directed layouts, and differential expression analysis, annotate clusters, and plottings.