pegasus.read_input

pegasus.read_input(input_file, genome=None, return_type='AnnData', concat_matrices=False, h5ad_mode='a', ngene=None, select_singlets=False, channel_attr=None, chunk_size=None, black_list=[])[source]

Load data into memory.

This function is used to load input data into memory. Inputs can be in 10x genomics v2 & v3 formats (hdf5 or mtx), HCA DCP mtx and csv formats, Drop-seq dge format, and CSV format.

Parameters
  • input_file (str) – Input file name.

  • genome (str, optional (default: None)) – A string contains comma-separated genome names. pegasus will read all matrices matching the genome names. If genome is None, all matrices will be considered. For formats like loom, mtx, dge, csv and tsv, genome is used to provide genome name. In this case if genome is None, except mtx format, ‘’ is used as the genome name instead.

  • return_type (str) – Return object type, can be either ‘MemData’ or ‘AnnData’.

  • concat_matrices (boolean, optional (default: False)) – If input file contains multiple matrices, turning this option on will concatenate them into one AnnData object. Otherwise return a list of AnnData objects.

  • h5ad_mode (str, optional (default: ‘a’)) – If input is in h5ad format, the backed mode for loading the data. Mode could be ‘a’, ‘r’, ‘r+’, where ‘a’ refers to load the whole matrix into memory.

  • ngene (int, optional (default: None)) – Minimum number of genes to keep a barcode. Default is to keep all barcodes.

  • select_singlets (bool, optional (default: False)) – If this option is on, only keep DemuxEM-predicted singlets when loading data.

  • channel_attr (str, optional (default: None)) – Use channel_attr to represent different samples. This will set a ‘Channel’ column field with channel_attr.

  • chunk_size (int, optional (default: None)) – Chunk size for reading dense matrices as sparse

  • black_list (List[str], optional (default: [])) – Attributes in black list will be poped out.

Returns

An MemData object or anndata object or a list of anndata objects containing the count matrices.

Return type

MemData object or anndata object or a list of anndata objects

Examples

>>> adata = pg.read_input('example_10x.h5', genome = 'mm10')
>>> adata = pg.read_input('example.h5ad', h5ad_mode = 'r+')
>>> adata = pg.read_input('example_ADT.csv')