Missing data in multi-omics integration: Recent advances through artificial intelligence

被引:59
作者
Flores, Javier E. [1 ]
Claborne, Daniel M. [2 ]
Weller, Zachary D. [2 ]
Webb-Robertson, Bobbie-Jo M. [1 ]
Waters, Katrina M. [1 ]
Bramer, Lisa M. [1 ]
机构
[1] Earth & Biol Sci Directorate, Biol Sci Div, Pacific Northwest Natl Lab, Richland, WA 99354 USA
[2] Natl Secur Directorate, Artificial Intelligence & Data Analyt Div, Pacific Northwest Natl Lab, Richland, WA USA
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 6卷
关键词
data integration; missing data; multi-omics; multi-view; artificial intelligence; machine learning; neural networks; Bayesian; IMPUTATION; CHALLENGES; BIOMARKER; VALUES; IDENTIFICATION; ALGORITHMS; LIKELIHOOD; PROTEOMICS; DISCOVERY; SETS;
D O I
10.3389/frai.2023.1098308
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
引用
收藏
页数:15
相关论文
共 138 条
[1]  
Alemi AA, 2019, Arxiv, DOI arXiv:1612.00410
[2]   Plasma metabolomics reveals a diagnostic metabolic fingerprint for mitochondrial aconitase (ACO2) deficiency [J].
Abela, Lucia ;
Spiegel, Ronen ;
Crowther, Lisa M. ;
Klein, Andrea ;
Steindl, Katharina ;
Papuc, Sorina Mihaela ;
Joset, Pascal ;
Zehavi, Yoav ;
Rauch, Anita ;
Plecko, Barbara ;
Simmons, Thomas Luke .
PLOS ONE, 2017, 12 (05)
[3]   N8-acetylspermidine as a potential plasma biomarker for Snyder-Robinson syndrome identified by clinical metabolomics [J].
Abela, Lucia ;
Simmons, Luke ;
Steindl, Katharina ;
Schmitt, Bernhard ;
Mastrangelo, Massimo ;
Joset, Pascal ;
Papuc, Mihaela ;
Sticht, Heinrich ;
Baumer, Alessandra ;
Crowther, Lisa M. ;
Mathis, Deborah ;
Rauch, Anita ;
Plecko, Barbara .
JOURNAL OF INHERITED METABOLIC DISEASE, 2016, 39 (01) :131-137
[4]  
Achille A., 2017, 2018 Information Theory and Applications Workshop (ITA), V19, P1
[5]   Deep Audio-Visual Speech Recognition [J].
Afouras, Triantafyllos ;
Chung, Joon Son ;
Senior, Andrew ;
Vinyals, Oriol ;
Zisserman, Andrew .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :8717-8727
[6]  
[Anonymous], 2006, Data Analysis using regression and multilevel/hierarchical models, DOI [DOI 10.1017/CBO9780511790942, DOI 10.1017/CBO9780511790942.005]
[7]  
Arakawa Kazuharu, 2013, Methods Mol Biol, V985, P459, DOI 10.1007/978-1-62703-299-5_23
[8]   MOFA plus : a statistical framework for comprehensive integration of multi-modal single-cell data [J].
Argelaguet, Ricard ;
Arnol, Damien ;
Bredikhin, Danila ;
Deloro, Yonatan ;
Velten, Britta ;
Marioni, John C. ;
Stegle, Oliver .
GENOME BIOLOGY, 2020, 21 (01)
[9]   Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets [J].
Argelaguet, Ricard ;
Velten, Britta ;
Arnol, Damien ;
Dietrich, Sascha ;
Zenz, Thorsten ;
Marioni, John C. ;
Buettner, Florian ;
Huber, Wolfgang ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)
[10]   Accelerating the search for the missing proteins in the human proteome [J].
Baker, Mark S. ;
Ahn, Seong Beom ;
Mohamedali, Abidali ;
Islam, Mohammad T. ;
Cantor, David ;
Verhaert, Peter D. ;
Fanayan, Susan ;
Sharma, Samridhi ;
Nice, Edouard C. ;
Connor, Mark ;
Ranganathan, Shoba .
NATURE COMMUNICATIONS, 2017, 8