Release Notes¶
Note
Also see the release notes of PegasusIO.
Version 1.2¶
1.2.0 December 25, 2020¶
tSNE support:
tsne
function in API: Use FIt-SNE for tSNE embedding calculation. No longer support MulticoreTSNE.Determine
learning_rate
argument intsne
more dynamically. ([Belkina19], [Kobak19])By default, use PCA embedding for initialization in
tsne
. ([Kobak19])Remove
net_tsne
andfitsne
functions from API.Remove
--net-tsne
and--fitsne
options from pegasus cluster command.
Add multimodal support on RNA and CITE-Seq data back:
--citeseq
,--citeseq-umap
, and--citeseq-umap-exclude
in pegasus cluster command.Doublet detection:
Add automated doublet cutoff inference to
infer_doublets
function in API. ([Li20-2])Expose doublet detection to command-line tool:
--infer-doublets
,--expected-doublet-rate
, and--dbl-cluster-attr
in pegasus cluster command.Add doublet detection tutorial.
Allow multiple marker files used in cell type annotation:
annotate
function in API;--markers
option in pegasus annotate_cluster command.Rename pc_regress_out function in API to
regress_out
.Update the regress out tutorial.
Bug fix.
Version 1.1¶
1.1.0 December 7, 2020¶
Improve doublet detection in Scrublet-like way using automatic threshold selection strategy:
infer_doublets
, andmark_doublets
. Remove Scrublet from dependency, and removerun_scrublet
function.Enhance performance of log-normalization (
log_norm
) and signature score calculation (calc_signature_score
).In
pegasus cluster
command, add--genome
option to specify reference genome name for input data ofdge
,csv
, orloom
format.Update Regress out tutorial.
Add
ridgeplot
.Improve plotting functions:
heatmap
, anddendrogram
.Bug fix.
Version 1.0¶
1.0.0 September 22, 2020¶
New features:
Use
zarr
file format to handle data, which has a better I/O performance in general.Multi-modality support:
Data are manipulated in Multi-modal structure in memory.
Support focus analysis on Unimodal data, and appending other Unimodal data to it. (
--focus
and--append
options incluster
command)
Calculate signature / gene module scores. (calc_signature_score)
Doublet detection based on Scrublet:
run_scrublet
,infer_doublets
, andmark_doublets
.Principal-Component-level regress out. (pc_regress_out)
Batch correction using Scanorama. (run_scanorama)
Allow DE analysis with sample attribute as condition. (Set
condition
argument in de_analysis)Use static plots to show results (see Plotting):
Provide static plots: composition plot, embedding plot (e.g. tSNE, UMAP, FLE, etc.), dot plot, feature plot, and volcano plot;
Add more gene-specific plots: dendrogram, heatmap, violin plot, quality-control violin, HVF plot.
Deprecations:
No longer support
h5sc
file format, which was the output format ofaggregate_matrix
command in Pegasus version0.x
.Remove
net_fitsne
function.
API changes:
In cell quality-control, default percent of mitochondrial genes is changed from 10.0 to 20.0. (
percent_mito
argument in qc_metrics;--percent-mito
option incluster
command)Move gene quality-control out of
filter_data
function to be a separate step. (identify_robust_genes)DE analysis now uses MWU test by default, not t test. (de_analysis)
infer_cell_types uses MWU test as the default
de_test
.
Performance improvement:
Speed up MWU test in DE analysis, which is inspired by Presto.
Integrate Fisher’s exact test via Cython in DE analysis to improve speed.
Other highlights:
Make I/O and count matrices aggregation a dedicated package PegasusIO.
Tutorials:
Update Analysis tutorial;
Add 3 more tutorials: one on plotting library, one on batch correction and data integration, and one on regress out.
Version 0.x¶
0.17.2 June 26, 2020¶
Make Pegasus compatible with umap-learn v0.4+.
Use louvain 0.7+ for Louvain clustering.
Update tutorial.
0.17.1 April 6, 2020¶
Improve pegasus command-line tool log.
Add human lung markers.
Improve log-normalization speed.
Provide robust version of PCA calculation as an option.
Add signature score calculation API.
Fix bugs.
0.17.0 March 10, 2020¶
Support anndata 0.7 and pandas 1.0.
Better
loom
format output writing function.Bug fix on
mtx
format output writing function.Update human immune cell markers.
Improve
pegasus scp_output
command.
0.16.11 February 28, 2020¶
Add
--remap-singlets
and--subset-singlets
options to ‘cluster’ command.Allow reading
loom
file with user-specified batch key and black list.
0.16.9 February 17, 2020¶
Allow reading h5ad
file with user-specified batch key.
0.16.8 January 30, 2020¶
Allow input annotated loom
file.
0.16.7 January 28, 2020¶
Allow input mtx
files of more filename formats.
0.16.5 January 23, 2020¶
Add Harmony algorithm for data integration.
0.16.3 December 17, 2019¶
Add support for loading mtx files generated from BUStools.
0.16.2 December 8, 2019¶
Fix bug in ‘subcluster’ command.
0.16.1 December 4, 2019¶
Fix one bug in clustering pipeline.
0.16.0 December 3, 2019¶
Change options in ‘aggregate_matrix’ command: remove ‘–google-cloud’, add ‘–default-reference’.
Fix bug in ‘–annotation’ option of ‘annotate_cluster’ command.
Fix bug in ‘net_fle’ function with 3-dimension coordinates.
Use fisher package version 0.1.9 or above, as modifications in our forked fisher-modified package has been merged into it.
0.15.0 October 2, 2019¶
Rename package to PegasusPy, with module name pegasus.
0.14.0 September 17, 2019¶
Provide Python API for interactive analysis.
0.10.0 January 31, 2019¶
Added ‘find_markers’ command to find markers using LightGBM.
Improved file loading speed and enabled the parsing of channels from barcode strings for cellranger aggregated h5 files.
0.9.0 January 17, 2019¶
In ‘cluster’ command, changed ‘–output-seurat-compatible’ to ‘–make-output-seurat-compatible’. Do not generate output_name.seurat.h5ad. Instead, output_name.h5ad should be able to convert to a Seurat object directly. In the seurat object, raw.data slot refers to the filtered count data, data slot refers to the log-normalized expression data, and scale.data refers to the variable-gene-selected, scaled data.
In ‘cluster’ command, added ‘–min-umis’ and ‘–max-umis’ options to filter cells based on UMI counts.
In ‘cluster’ command, ‘–output-filtration-results’ option does not require a spreadsheet name anymore. In addition, added more statistics such as median number of genes per cell in the spreadsheet.
In ‘cluster’ command, added ‘–plot-filtration-results’ and ‘–plot-filtration-figsize’ to support plotting filtration results. Improved documentation on ‘cluster command’ outputs.
Added ‘parquet’ command to transfer h5ad file into a parquet file for web-based interactive visualization.
0.8.0 November 26, 2018¶
Added support for checking index collision for CITE-Seq/hashing experiments.
0.7.0 October 26, 2018¶
Added support for CITE-Seq analysis.
0.4.0 August 2, 2018¶
Added mouse brain markers.
Allow aggregate matrix to take ‘Sample’ as attribute.
0.3.0 June 26, 2018¶
scrtools supports fast preprocessing, batch-correction, dimension reduction, graph-based clustering, diffusion maps, force-directed layouts, and differential expression analysis, annotate clusters, and plottings.