Interpretable, Scalable, and Transferrable Functional Projection of Large-Scale Transcriptome Data Using Constrained Matrix Decomposition

被引:3
作者
Panchy, Nicholas [1 ]
Watanabe, Kazuhide [2 ]
Hong, Tian [1 ,3 ]
机构
[1] Univ Tennessee, Dept Biochem & Cellular & Mol Biol, Knoxville, TN 37996 USA
[2] RIKEN Ctr Integrat Med Sci, Yokohama, Kanagawa, Japan
[3] Natl Inst Math & Biol Synth, Knoxville, TN 37996 USA
基金
美国国家卫生研究院;
关键词
dimensionality reduction; gene set analysis; EMT; single-cell 'omics; RNA-sequencing data; SURVIVAL; EMT;
D O I
10.3389/fgene.2021.719099
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Large-scale transcriptome data, such as single-cell RNA-sequencing data, have provided unprecedented resources for studying biological processes at the systems level. Numerous dimensionality reduction methods have been developed to visualize and analyze these transcriptome data. In addition, several existing methods allow inference of functional variations among samples using gene sets with known biological functions. However, it remains challenging to analyze transcriptomes with reduced dimensions that are interpretable in terms of dimensions' directionalities, transferrable to new data, and directly expose the contribution or association of individual genes. In this study, we used gene set non-negative principal component analysis (gsPCA) and non-negative matrix factorization (gsNMF) to analyze large-scale transcriptome datasets. We found that these methods provide low-dimensional information about the progression of biological processes in a quantitative manner, and their performances are comparable to existing functional variation analysis methods in terms of distinguishing multiple cell states and samples from multiple conditions. Remarkably, upon training with a subset of data, these methods allow predictions of locations in the functional space using data from experimental conditions that are not exposed to the models. Specifically, our models predicted the extent of progression and reversion for cells in the epithelial-mesenchymal transition (EMT) continuum. These methods revealed conserved EMT program among multiple types of single cells and tumor samples. Finally, we demonstrate this approach is broadly applicable to data and gene sets beyond EMT and provide several recommendations on the choice between the two linear methods and the optimal algorithmic parameters. Our methods show that simple constrained matrix decomposition can produce to low-dimensional information in functionally interpretable and transferrable space, and can be widely useful for analyzing large-scale transcriptome data.
引用
收藏
页数:16
相关论文
共 48 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   Dimensionality reduction for visualizing single-cell data using UMAP [J].
Becht, Etienne ;
McInnes, Leland ;
Healy, John ;
Dutertre, Charles-Antoine ;
Kwok, Immanuel W. H. ;
Ng, Lai Guan ;
Ginhoux, Florent ;
Newell, Evan W. .
NATURE BIOTECHNOLOGY, 2019, 37 (01) :38-+
[3]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[4]   Expansion of the Gene Ontology knowledgebase and resources [J].
Carbon, S. ;
Dietze, H. ;
Lewis, S. E. ;
Mungall, C. J. ;
Munoz-Torres, M. C. ;
Basu, S. ;
Chisholm, R. L. ;
Dodson, R. J. ;
Fey, P. ;
Thomas, P. D. ;
Mi, H. ;
Muruganujan, A. ;
Huang, X. ;
Poudel, S. ;
Hu, J. C. ;
Aleksander, S. A. ;
McIntosh, B. K. ;
Renfro, D. P. ;
Siegele, D. A. ;
Antonazzo, G. ;
Attrill, H. ;
Brown, N. H. ;
Marygold, S. J. ;
McQuilton, P. ;
Ponting, L. ;
Millburn, G. H. ;
Rey, A. J. ;
Stefancsik, R. ;
Tweedie, S. ;
Falls, K. ;
Schroeder, A. J. ;
Courtot, M. ;
Osumi-Sutherland, D. ;
Parkinson, H. ;
Roncaglia, P. ;
Lovering, R. C. ;
Foulger, R. E. ;
Huntley, R. P. ;
Denny, P. ;
Campbell, N. H. ;
Kramarz, B. ;
Patel, S. ;
Buxton, J. L. ;
Umrao, Z. ;
Deng, A. T. ;
Alrohaif, H. ;
Mitchell, K. ;
Ratnaraj, F. ;
Omer, W. ;
Rodriguez-Lopez, M. .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D331-D338
[5]   Comparative Study of Transcriptomics-Based Scoring Metrics for the Epithelial-Hybrid-Mesenchymal Spectrum [J].
Chakraborty, Priyanka ;
George, Jason T. ;
Tripathi, Shubham ;
Levine, Herbert ;
Jolly, Mohit Kumar .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
[6]   Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization [J].
Chen, Jinyu ;
Zhang, Shihua .
NUCLEIC ACIDS RESEARCH, 2018, 46 (12) :5967-5976
[7]   TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data [J].
Colaprico, Antonio ;
Silva, Tiago C. ;
Olsen, Catharina ;
Garofano, Luciano ;
Cava, Claudia ;
Garolini, Davide ;
Sabedot, Thais S. ;
Malta, Tathiane M. ;
Pagnotta, Stefano M. ;
Castiglioni, Isabella ;
Ceccarelli, Michele ;
Bontempi, Gianluca ;
Noushmehr, Houtan .
NUCLEIC ACIDS RESEARCH, 2016, 44 (08) :e71
[8]   Context specificity of the EMT transcriptional response [J].
Cook, David P. ;
Vanderhyden, Barbara C. .
NATURE COMMUNICATIONS, 2020, 11 (01)
[9]   Combinatorial Targeting by MicroRNAs Co-ordinates Post-transcriptional Control of EMT [J].
Cursons, Joseph ;
Pillman, Katherine A. ;
Scheer, Kaitlin G. ;
Gregory, Philip A. ;
Foroutan, Momeneh ;
Hediyeh-Zadeh, Soroor ;
Toubia, John ;
Crampin, Edmund J. ;
Goodall, Gregory J. ;
Bracken, Cameron P. ;
Davis, Melissa J. .
CELL SYSTEMS, 2018, 7 (01) :77-+
[10]   FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data [J].
DeTomaso, David ;
Yosef, Nir .
BMC BIOINFORMATICS, 2016, 17