pegasus.aggregate_matrices

pegasus.aggregate_matrices(csv_file, what_to_return='AnnData', restrictions=[], attributes=[], default_ref=None, select_singlets=False, ngene=None, concat_matrices=False)[source]

Aggregate channel-specific count matrices into one big count matrix.

This function takes as input a csv_file, which contains at least 2 columns — Sample, sample name; Location, file that contains the count matrices (e.g. filtered_gene_bc_matrices_h5.h5), and merges matrices from the same genome together. Depending on what_to_return, it can output the merged results into a pegasus-formatted HDF5 file or return as an AnnData or MemData object.

Parameters
  • csv_file (str) – The CSV file containing information about each channel.

  • what_to_return (str, optional (default: ‘AnnData’)) – If this value is equal to ‘AnnData’ or ‘MemData’, an AnnData or MemData object will be returned. Otherwise, results will be written into ‘what_to_return.h5sc’ file and None is returned.

  • restrictions (list[str], optional (default: [])) – A list of restrictions used to select channels, each restriction takes the format of name:value,…,value or name:~value,..,value, where ~ refers to not.

  • attributes (list[str], optional (default: [])) – A list of attributes need to be incorporated into the output count matrix.

  • default_ref (str, optional (default: None)) – Default reference name to use. If sample count matrix is in either DGE, mtx, csv or tsv format and there is no Reference column in the csv_file, default_ref will be used as the reference.

  • select_singlets (bool, optional (default: False)) – If we have demultiplexed data, turning on this option will make pegasus only include barcodes that are predicted as singlets.

  • ngene (int, optional (default: None)) – The minimum number of expressed genes to keep one barcode.

  • concat_matrices (bool, optional (default: False)) – If concatenate multiple matrices. If so, return only one AnnData object, otherwise, might return a list of AnnData objects.

Returns

Either None or an AnnData object or a MemData object.

Return type

None or AnnData or MemData

Examples

>>> pg.aggregate_matrix('example.csv', 'example_10x.h5', ['Source:pbmc', 'Donor:1'], ['Source', 'Platform', 'Donor'])