import pegasus as pg

data = pg.read_input("nestorawa_forcellcycle_expressionMatrix.txt")
data

2020-12-02 00:06:30,984 - pegasusio.readwrite - INFO - tsv file 'nestorawa_forcellcycle_expressionMatrix.txt' is loaded.
2020-12-02 00:06:30,985 - pegasusio.readwrite - INFO - Function 'read_input' finished in 3.00s.

MultimodalData object with 1 UnimodalData: 'unknown-rna'
    It currently binds to UnimodalData object unknown-rna

UnimodalData object with n_obs x n_vars = 773 x 24193
    Genome: unknown; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 
    var: 
    obsm: 
    varm: 
    uns: 'genome', 'modality'


pg.qc_metrics(data, min_genes=0, max_genes=1e5)
pg.filter_data(data)
pg.identify_robust_genes(data)
pg.log_norm(data, norm_count=1e4)

2020-12-02 00:06:31,125 - pegasusio.qc_utils - INFO - After filtration, 773 out of 773 cell barcodes are kept in UnimodalData object unknown-rna.
2020-12-02 00:06:31,259 - pegasus.tools.preprocessing - INFO - After filtration, 24158/24193 genes are kept. Among 24158 genes, 24158 genes are robust.
2020-12-02 00:06:31,374 - pegasus.tools.preprocessing - INFO - Function 'log_norm' finished in 0.11s.


pg.calc_signature_score(data, 'cell_cycle_human')

2020-12-02 00:06:31,424 - pegasus.tools.signature_score - INFO - Loaded signatures from GMT file /Users/yy939/GitHub/pegasus/pegasus/data_files/cell_cycle_human.gmt.
2020-12-02 00:06:31,427 - pegasus.tools.signature_score - INFO - Signature G1/S: 42 out of 43 genes are used in signature score calculation.
2020-12-02 00:06:31,447 - pegasus.tools.signature_score - INFO - Signature G2/M: 52 out of 54 genes are used in signature score calculation.
2020-12-02 00:06:31,475 - pegasus.tools.signature_score - INFO - Function 'calc_signature_score' finished in 0.09s.


cell_cycle_genes = []
with open("cell_cycle_human.gmt", 'r') as f:
    for line in f:
        cell_cycle_genes += line.strip().split('\t')[2:]


data.obs['predicted_phase'].value_counts()

G0      468
G1/S    157
G2/M    148
Name: predicted_phase, dtype: int64


data_cc_genes = data[:, cell_cycle_genes].copy()
pg.pca(data_cc_genes)
data.obsm['X_pca'] = data_cc_genes.obsm['X_pca']

2020-12-02 00:06:31,626 - pegasus.tools.preprocessing - INFO - Function 'pca' finished in 0.07s.


pg.scatter(data, attrs='predicted_phase', basis='pca', dpi=130)


pca_key = pg.pc_regress_out(data, attrs=['G1/S', 'G2/M'])

2020-12-02 00:06:31,935 - pegasus.tools.preprocessing - INFO - Function 'pc_regress_out' finished in 0.13s.


pg.scatter(data, attrs=['predicted_phase'], basis=pca_key, dpi=130)


import numpy as np
from sklearn.decomposition import PCA

X = data.obsm['X_' + pca_key]
pca = PCA(n_components=X.shape[1], random_state=0, svd_solver='full')
X_pca_new = pca.fit_transform(X)
data.obsm['X_pca_new'] = np.ascontiguousarray(X_pca_new)

pg.scatter(data, attrs=['predicted_phase'], basis='pca_new', dpi=130)

Regress Out Tutorial¶

Cell-Cycle Scores¶

Cell Cycle Effects¶

Regress Out Cell Cycle Effects¶

More on Regress Out Result¶

Summary¶