DeepMF: deciphering the latent patterns in omics profiles with a deep learning method

被引:16
作者
Chen, Lingxi [1 ]
Xu, Jiao [1 ]
Li, Shuai Cheng [1 ]
机构
[1] City Univ Hong Kong, Kowloon Tong, 83 Tat Chee Ave, Hong Kong, Peoples R China
关键词
Matrix factorization; Dimension reduction; Deep learning; Omics profile; Cancer subtype; MUTATIONAL PROCESSES; SIGNATURES; PACKAGE;
D O I
10.1186/s12859-019-3291-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: With recent advances in high-throughput technologies, matrix factorization techniques are increasingly being utilized for mapping quantitative omics profiling matrix data into low-dimensional embedding space, in the hope of uncovering insights in the underlying biological processes. Nevertheless, current matrix factorization tools fall short in handling noisy data and missing entries, both deficiencies that are often found in real-life data. Results: Here, we propose DeepMF, a deep neural network-based factorization model. DeepMF disentangles the association between molecular feature-associated and sample-associated latent matrices, and is tolerant to noisy and missing values. It exhibited feasible cancer subtype discovery efficacy on mRNA, miRNA, and protein profiles of medulloblastoma cancer, leukemia cancer, breast cancer, and small-blue-round-cell cancer, achieving the highest clustering accuracy of 76%, 100%, 92%, and 100% respectively. When analyzing data sets with 70% missing entries, DeepMF gave the best recovery capacity with silhouette values of 0.47, 0.6, 0.28, and 0.44, outperforming other state-of-the-art MF tools on the cancer data sets Medulloblastoma, Leukemia, TCGA BRCA, and SRBCT. Its embedding strength as measured by clustering accuracy is 88%, 100%, 84%, and 96% on these data sets, which improves on the current best methods 76%, 100%, 78%, and 87%. Conclusion: DeepMF demonstrated robust denoising, imputation, and embedding ability. It offers insights to uncover the underlying biological processes such as cancer subtype discovery. Our implementation of DeepMF can be found at https://github.com/paprikachan/DeepMF.
引用
收藏
页数:13
相关论文
共 28 条
[1]   Dealing with missing values in large-scale studies: microarray data imputation and beyond [J].
Aittokallio, Tero .
BRIEFINGS IN BIOINFORMATICS, 2010, 11 (02) :253-264
[2]  
Alexandrov LB., 2020, BioRxiv, DOI 10.1101/322859
[3]   Signatures of mutational processes in human cancer [J].
Alexandrov, Ludmil B. ;
Nik-Zainal, Serena ;
Wedge, David C. ;
Aparicio, Samuel A. J. R. ;
Behjati, Sam ;
Biankin, Andrew V. ;
Bignell, Graham R. ;
Bolli, Niccolo ;
Borg, Ake ;
Borresen-Dale, Anne-Lise ;
Boyault, Sandrine ;
Burkhardt, Birgit ;
Butler, Adam P. ;
Caldas, Carlos ;
Davies, Helen R. ;
Desmedt, Christine ;
Eils, Roland ;
Eyfjord, Jorunn Erla ;
Foekens, John A. ;
Greaves, Mel ;
Hosoda, Fumie ;
Hutter, Barbara ;
Ilicic, Tomislav ;
Imbeaud, Sandrine ;
Imielinsk, Marcin ;
Jaeger, Natalie ;
Jones, David T. W. ;
Jones, David ;
Knappskog, Stian ;
Kool, Marcel ;
Lakhani, Sunil R. ;
Lopez-Otin, Carlos ;
Martin, Sancha ;
Munshi, Nikhil C. ;
Nakamura, Hiromi ;
Northcott, Paul A. ;
Pajic, Marina ;
Papaemmanuil, Elli ;
Paradiso, Angelo ;
Pearson, John V. ;
Puente, Xose S. ;
Raine, Keiran ;
Ramakrishna, Manasa ;
Richardson, Andrea L. ;
Richter, Julia ;
Rosenstiel, Philip ;
Schlesner, Matthias ;
Schumacher, Ton N. ;
Span, Paul N. ;
Teague, Jon W. .
NATURE, 2013, 500 (7463) :415-+
[4]   Deciphering Signatures of Mutational Processes Operative in Human Cancer [J].
Alexandrov, Ludmil B. ;
Nik-Zainal, Serena ;
Wedge, David C. ;
Campbell, Peter J. ;
Stratton, Michael R. .
CELL REPORTS, 2013, 3 (01) :246-259
[5]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[6]  
Fei Zhang, 2018, Journal of Physics: Conference Series, V1060, DOI 10.1088/1742-6596/1060/1/012001
[7]   Preferential Activation of the Hedgehog Pathway by Epigenetic Modulations in HPV Negative HNSCC Identified with Meta-Pathway Analysis [J].
Fertig, Elana J. ;
Markovic, Ana ;
Danilova, Ludmila V. ;
Gaykalova, Daria A. ;
Cope, Leslie ;
Chung, Christine H. ;
Ochs, Michael F. ;
Califano, Joseph A. .
PLOS ONE, 2013, 8 (11)
[8]   Identifying Context-Specific Transcription Factor Targets From Prior Knowledge and Gene Expression Data [J].
Fertig, Elana J. ;
Favorov, Alexander V. ;
Ochs, Michael F. .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2013, 12 (03) :142-149
[9]   CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data [J].
Fertig, Elana J. ;
Ding, Jie ;
Favorov, Alexander V. ;
Parmigiani, Giovanni ;
Ochs, Michael F. .
BIOINFORMATICS, 2010, 26 (21) :2792-2793
[10]   A flexible R package for nonnegative matrix factorization [J].
Gaujoux, Renaud ;
Seoighe, Cathal .
BMC BIOINFORMATICS, 2010, 11