Release Notes
Note
Also see the release notes of PegasusIO.
Version 1.9
1.9.1 March 16, 2024
New Feature
Add
write_fgsea_results_to_excel
function (see documentation)
Improvement
For
dotplot
function, addshow_only_expressed
parameter to decide whether the color intensity of dots are based on cells expressing the genes to show or all cells. By default,show_only_expressed=True
. (PR 292)Allow
scatter
andspatial
functions to have a list ofvmin
andvmax
values when plotting multiple features.For
scatter
,spatial
,dotplot
,violin
functions, when some features are not in the data, emit a warning message and continue with features existing in the data.In
spatial
function, addnrows
andncols
to organize subplots.In
calculate_z_score
function, enforce the input count matrix to be dense orscipy.csr_matrix
.In
plot_gsea
function, addlabel_fontsize
parametere to allow change font label size in GSEA plots.
1.9.0 January 19, 2024
New Feature and Improvement
calculate_z_score
works with sparse count matrix. [PR 276 Thanks to Jayaram Kancherla]Plotting functions (
scatter
,dotplot
,violin
,heatmap
) now give warnings on genes/attributes not existing in the data, and skip them in the plots.Improve
heatmap
:Add
show_sample_name
parameter for cases of pseudo-bulk data, nanoString DSP data, etc.Use Scipy’s linkage (
scipy.cluster.hierarchy.linkage
) for dendrograms to use its optimal ordering feature for better results (seegroupby_optimal_ordering
parameter).
Update human lung and mouse immune markers used by
infer_cell_types
function.run_harmony
can accept multiple attributes to be the batch key, by providing a list of attribute names to itsbatch
parameter.Expose
online_batch_size
parameter innmf
andintegrative_nmf
functions.
Version 1.8
1.8.1 August 23, 2023
Bug fix in cell marker JSON files for
infer_cell_types
function.
1.8.0 July 21, 2023
New Feature and Improvement
Updata
human_immune
andhuman_lung
marker sets.Add
mouse_liver
marker set.Add split_one_cluster function to subcluster one cluster into a specified number of subclusters.
Update neighbors function to set
use_cache=False
by default, and adjust K tomin(K, int(sqrt(n_samples)))
. [PR 272]In infer_doublets function, argument
manual_correction
now accepts a float number threshold specified by users for cut-off. [PR 275]
Bug Fix
Fix divide by zero issue in
integrative_nmf
function. [PR 258]Compatibility with Pandas v2.0. [PR 261]
Allow
infer_doublets
to use any count matrix with key name specified by users. [PR 268 Thanks to Donghoon Lee]
Version 1.7
1.7.1 July 29, 2022
1.7.0 July 5, 2022
New Features
Add pegasus.elbowplot function to generate elbowplot, with an automated suggestion on number of PCs to be selected based on random matrix theory ([Johnstone 2001] and [Shekhar 2022]).
Add arcsinh_transform function for arcsinh transformation on the count matrix.
Improvement
Function
nearest_neighbors
has additional argumentn_comps
to allow use part of the components from the source embedding for calculating the nearest neighbor graph.Add
n_comps
argument forrun_harmony
,tsne
,umap
, andfle
(argument name isrep_ncomps
) functions to allow select part of the components from the source embedding.Function
scatter
can plot multiple components and bases.
Version 1.6
1.6.0 April 16, 2022
New Features
Add support for scVI-tools:
Function pegasus.run_scvi, which is a wrapper of scVI for data integration.
Add a dedicated section for scVI method in Pegasus batch correction tutorial.
Function pegasus.train_scarches_scanvi and pegasus.predict_scarches_scanvi to wrap scArches for transfer learning on single-cell data labels.
Version 1.5
1.5.0 March 9, 2022
New Features
Spatial data analysis:
Enable
pegasus.read_input
function to load 10x Visium data: setfile_type="visium"
option.Add
pegasus.spatial
function to generate spatial plot for 10x Visium data.Add Spatial Analysis Tutorial in Tutorials.
Pseudobulk analysis: see summary
Add
pegasus.pseudobulk
function to generate pseudobulk matrix.Add
pegasus.deseq2
function to perform pseudobulk differential expression (DE) analysis, which is a Python wrapper of DESeq2.Requires rpy2 and the original DESeq2 R package installed.
Add
pegasus.pseudo.markers
,pegasus.pseudo.write_results_to_excel
andpegasus.pseudo.volcano
functions for processing pseudobulk DE results.Add Pseudobulk Analysis Tutorial in Tutorials.
Add
pegasus.fgsea
function to perform Gene Set Enrichment Analysis (GSEA) on DE results and plotting, which is a Python wrapper of fgsea.Requires rpy2 and the original fgsea R package installed.
API Changes
Function
correct_batch
, which implements the L/S adjustment batch correction method, is obsolete. We recommend usingrun_harmony
instead, which is also the default of--correct-batch-effect
option inpegasus cluster
command.pegasus.highly_variable_features
allows specify custom attribute key for batches (batch
option), and thus removeconsider_batch
option. To select HVGs without considering batch effects, simply use the default, or equivalently usebatch=None
option.Add
dist
option topegasus.neighbors
function to allow use distance other than L2. (Contribution by hoondy in PR 233)Available options:
l2
for L2 (default),ip
for inner product, andcosine
for cosine similarity.
The kNN graph returned by
pegasus.neighbors
function is now stored inobsm
field of the data object, no longer inuns
field. Moreover, the kNN affinity matrix is stored inobsp
field.
Improvements
Adjust
pegasus.write_output
function to work with Zarr v2.11.0+.
Version 1.4
1.4.5 January 24, 2022
Make several dependencies optional to meet with different use cases.
Adjust to umap-learn v0.5.2 interface.
1.4.4 October 22, 2021
Use PegasusIO v0.4.0+ for data manipulation.
Add
calculate_z_score
function to calculate standardized z-scored count matrix.
1.4.3 July 25, 2021
Allow
run_harmony
function to use GPU for computation.
1.4.2 July 19, 2021
Bug fix for
--output-h5ad
and--citeseq
options inpegasus cluster
command.
1.4.1 July 17, 2021
Add NMF-related options to
pegasus cluster
command.Add word cloud graph plotting feature to
pegasus plot
command.pegasus aggregate_matrix
command now allow sample-specific filtration with parameters set in the input CSV-format sample sheet.Update doublet detection method:
infer_doublets
andmark_doublets
functions.Bug fix.
1.4.0 June 24, 2021
Add
nmf
andintegrative_nmf
functions to compute NMF and iNMF using nmf-torch package;integrative_nmf
supports quantile normalization proposed in the LIGER papers ([Welch19], [Gao21]).Change the parameter defaults of function
qc_metrics
: Now all defaults areNone
, meaning not performing any filtration on cell barcodes.In Annotate Clusters API functions:
Improve human immune cell markers and auto cell type assignment for human immune cells. (
infer_cell_types
function)Update mouse brain cell markers (
infer_cell_types
function)annotate
function now adds annotation as a categorical variable and sort categories in natural order.
Add
find_outlier_clusters
function to detect if any cluster is an outlier regarding one of the qc attributes (n_genes, n_counts, percent_mito) using MWU test.In Plotting API functions:
scatter
function now plots all cells ifattrs
==None
; Addfix_corners
option to fix the four corners when only a subset of cells is shown.Fix a bug in
heatmap
plotting function.
Fix bugs in functions
spectral_leiden
andspectral_louvain
.Improvements:
Support umap-learn v0.5+. (
umap
andnet_umap
functions)Update doublet detection algorithm. (
infer_doublets
function)Improve message reporting execution time spent each step. (Pegasus command line tool)
Version 1.3
1.3.0 February 2, 2021
Make PCA more reproducible. No need to keep options for robust PCA calculation:
In
pca
function, remove argumentrobust
.In
infer_doublets
function, remove argumentrobust
.In pegasus cluster command, remove option
--pca-robust
.
Add control on number of parallel threads for OpenMP/BLAS.
Now n_jobs = -1 refers to use all physical CPU cores instead of logcal CPU cores.
Remove function
reduce_diffmap_to_3d
. In pegasus cluster command, remove option--diffmap-to-3d
.Enhance compo_plot and dotplot functions’ usability.
Bug fix.
Version 1.2
1.2.0 December 25, 2020
tSNE support:
tsne
function in API: Use FIt-SNE for tSNE embedding calculation. No longer support MulticoreTSNE.Determine
learning_rate
argument intsne
more dynamically. ([Belkina19], [Kobak19])By default, use PCA embedding for initialization in
tsne
. ([Kobak19])Remove
net_tsne
andfitsne
functions from API.Remove
--net-tsne
and--fitsne
options from pegasus cluster command.
Add multimodal support on RNA and CITE-Seq data back:
--citeseq
,--citeseq-umap
, and--citeseq-umap-exclude
in pegasus cluster command.Doublet detection:
Add automated doublet cutoff inference to
infer_doublets
function in API. ([Li20-2])Expose doublet detection to command-line tool:
--infer-doublets
,--expected-doublet-rate
, and--dbl-cluster-attr
in pegasus cluster command.Add doublet detection tutorial.
Allow multiple marker files used in cell type annotation:
annotate
function in API;--markers
option in pegasus annotate_cluster command.Rename pc_regress_out function in API to
regress_out
.Update the regress out tutorial.
Bug fix.
Version 1.1
1.1.0 December 7, 2020
Improve doublet detection in Scrublet-like way using automatic threshold selection strategy:
infer_doublets
, andmark_doublets
. Remove Scrublet from dependency, and removerun_scrublet
function.Enhance performance of log-normalization (
log_norm
) and signature score calculation (calc_signature_score
).In
pegasus cluster
command, add--genome
option to specify reference genome name for input data ofdge
,csv
, orloom
format.Update Regress out tutorial.
Add
ridgeplot
.Improve plotting functions:
heatmap
, anddendrogram
.Bug fix.
Version 1.0
1.0.0 September 22, 2020
New features:
Use
zarr
file format to handle data, which has a better I/O performance in general.Multi-modality support:
Data are manipulated in Multi-modal structure in memory.
Support focus analysis on Unimodal data, and appending other Unimodal data to it. (
--focus
and--append
options incluster
command)
Calculate signature / gene module scores. (calc_signature_score)
Doublet detection based on Scrublet:
run_scrublet
,infer_doublets
, andmark_doublets
.Principal-Component-level regress out. (pc_regress_out)
Batch correction using Scanorama. (run_scanorama)
Allow DE analysis with sample attribute as condition. (Set
condition
argument in de_analysis)Use static plots to show results (see Plotting):
Provide static plots: composition plot, embedding plot (e.g. tSNE, UMAP, FLE, etc.), dot plot, feature plot, and volcano plot;
Add more gene-specific plots: dendrogram, heatmap, violin plot, quality-control violin, HVF plot.
Deprecations:
No longer support
h5sc
file format, which was the output format ofaggregate_matrix
command in Pegasus version0.x
.Remove
net_fitsne
function.
API changes:
In cell quality-control, default percent of mitochondrial genes is changed from 10.0 to 20.0. (
percent_mito
argument in qc_metrics;--percent-mito
option incluster
command)Move gene quality-control out of
filter_data
function to be a separate step. (identify_robust_genes)DE analysis now uses MWU test by default, not t test. (de_analysis)
infer_cell_types uses MWU test as the default
de_test
.
Performance improvement:
Speed up MWU test in DE analysis, which is inspired by Presto.
Integrate Fisher’s exact test via Cython in DE analysis to improve speed.
Other highlights:
Make I/O and count matrices aggregation a dedicated package PegasusIO.
Tutorials:
Update Analysis tutorial;
Add 3 more tutorials: one on plotting library, one on batch correction and data integration, and one on regress out.
Version 0.x
0.17.2 June 26, 2020
Make Pegasus compatible with umap-learn v0.4+.
Use louvain 0.7+ for Louvain clustering.
Update tutorial.
0.17.1 April 6, 2020
Improve pegasus command-line tool log.
Add human lung markers.
Improve log-normalization speed.
Provide robust version of PCA calculation as an option.
Add signature score calculation API.
Fix bugs.
0.17.0 March 10, 2020
Support anndata 0.7 and pandas 1.0.
Better
loom
format output writing function.Bug fix on
mtx
format output writing function.Update human immune cell markers.
Improve
pegasus scp_output
command.
0.16.11 February 28, 2020
Add
--remap-singlets
and--subset-singlets
options to ‘cluster’ command.Allow reading
loom
file with user-specified batch key and black list.
0.16.9 February 17, 2020
Allow reading h5ad
file with user-specified batch key.
0.16.8 January 30, 2020
Allow input annotated loom
file.
0.16.7 January 28, 2020
Allow input mtx
files of more filename formats.
0.16.5 January 23, 2020
Add Harmony algorithm for data integration.
0.16.3 December 17, 2019
Add support for loading mtx files generated from BUStools.
0.16.2 December 8, 2019
Fix bug in ‘subcluster’ command.
0.16.1 December 4, 2019
Fix one bug in clustering pipeline.
0.16.0 December 3, 2019
Change options in ‘aggregate_matrix’ command: remove ‘–google-cloud’, add ‘–default-reference’.
Fix bug in ‘–annotation’ option of ‘annotate_cluster’ command.
Fix bug in ‘net_fle’ function with 3-dimension coordinates.
Use fisher package version 0.1.9 or above, as modifications in our forked fisher-modified package has been merged into it.
0.15.0 October 2, 2019
Rename package to PegasusPy, with module name pegasus.
0.14.0 September 17, 2019
Provide Python API for interactive analysis.
0.10.0 January 31, 2019
Added ‘find_markers’ command to find markers using LightGBM.
Improved file loading speed and enabled the parsing of channels from barcode strings for cellranger aggregated h5 files.
0.9.0 January 17, 2019
In ‘cluster’ command, changed ‘–output-seurat-compatible’ to ‘–make-output-seurat-compatible’. Do not generate output_name.seurat.h5ad. Instead, output_name.h5ad should be able to convert to a Seurat object directly. In the seurat object, raw.data slot refers to the filtered count data, data slot refers to the log-normalized expression data, and scale.data refers to the variable-gene-selected, scaled data.
In ‘cluster’ command, added ‘–min-umis’ and ‘–max-umis’ options to filter cells based on UMI counts.
In ‘cluster’ command, ‘–output-filtration-results’ option does not require a spreadsheet name anymore. In addition, added more statistics such as median number of genes per cell in the spreadsheet.
In ‘cluster’ command, added ‘–plot-filtration-results’ and ‘–plot-filtration-figsize’ to support plotting filtration results. Improved documentation on ‘cluster command’ outputs.
Added ‘parquet’ command to transfer h5ad file into a parquet file for web-based interactive visualization.
0.8.0 November 26, 2018
Added support for checking index collision for CITE-Seq/hashing experiments.
0.7.0 October 26, 2018
Added support for CITE-Seq analysis.
0.4.0 August 2, 2018
Added mouse brain markers.
Allow aggregate matrix to take ‘Sample’ as attribute.
0.3.0 June 26, 2018
scrtools supports fast preprocessing, batch-correction, dimension reduction, graph-based clustering, diffusion maps, force-directed layouts, and differential expression analysis, annotate clusters, and plottings.