Human Multi-omics Data Pre-processing for Predictive Purposes Using Machine Learning: A Case Study in Childhood Obesity

被引:2
|
作者
Torres-Martos, Alvaro [1 ]
Anguita-Ruiz, Augusto [1 ,2 ,3 ,4 ]
Bustos-Aibar, Mireia [1 ]
Camara-Sanchez, Sofia [1 ]
Alcala, Rafael [5 ]
Aguilera, Concepcion M. [1 ,2 ,3 ]
Alcala-Fdez, Jesus [5 ]
机构
[1] Univ Granada, Sch Pharm, Dept Biochem & Mol Biol 2, Granada 18071, Spain
[2] Univ Granada, Complejo Hosp Univ Granada, Inst Invest Biosanitaria IBSGRANADA, Inst Nutr & Food Technol Jose Mataix,Ctr Biomed R, Avda Conocimiento S-N 18016, Granada 18012, Spain
[3] Inst Salud Carlos III, CIBEROBN, CIBER Physiopathol Obes & Nutr, Madrid 28029, Spain
[4] Barcelona Inst Global Hlth ISGlobal, Doctor Aiguader 88, Barcelona 08003, Spain
[5] Univ Granada, Andalusian Res Inst Data Sci & Computat Intellige, Dept Comp Sci & Artificial Intelligence, Granada 18071, Spain
来源
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, PT II | 2022年
关键词
Multi-omics; Data pre-processing; Machine learning; eXplainable Artificial Intelligence; GENOME-WIDE ASSOCIATION; LOCI; IDENTIFICATION; IMPUTATION;
D O I
10.1007/978-3-031-07802-6_31
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Machine Learning applications in the medical field using omics data are countless and promising, highlighting the possibility of creating long-term predictive models for highly prevalent diseases. Nevertheless, to take advantage of the virtues of omics data and machine learning tools, we first need to perform adequate data pre-processing just as taking some considerations for the constructions of the models. The present paper is an example of how to face the main challenges encountered when constructing machine learning predictive models with multi-omics human data. Some topics covered in this work include a description of the main particularities of each omics data layer, the most appropriate pre-processing approaches for each source, and a collection of good practices and tips for applying machine learning to this kind of data with predictive purposes. Using real data examples (blood samples), we illustrate how some of the key issues are addressed in this kind of research (technical noise, biological heterogeneity, class imbalance, high dimensionality, and presence of missing values, among others). Additionally, we set the basis for future work incorporating some proposals to improve models, arguing their need according to encountered insights.
引用
收藏
页码:359 / 374
页数:16
相关论文
共 50 条
  • [1] Prediction of metabolic risk in childhood obesity using machine learning models with multi-omics data
    Torres-Martos, A.
    Anguita-Ruiz, A.
    Bustos-Aibar, M.
    Alcala, R.
    Alcala-Fdez, J.
    Aguilera, C. M.
    ANNALS OF NUTRITION AND METABOLISM, 2022, 78 (SUPPL 3) : 22 - 22
  • [2] Predicting childhood allergy using machine learning methods on multi-omics data
    van Breugel, Merlijn
    Qi, Cancan
    Jiang, Yale
    Pedersen, Casper-Emil Tingskov
    Pethoukhov, Ilya
    Vonk, Judith
    Gehring, Ulrike
    Berg, Marijn
    Bugel, Marnix
    Capraij, Orestes
    Forno, Erick
    Morin, Andreanne
    Eliasen, Anders Ulrik
    Xu, Zhongli
    Van Den Berge, Maarten
    Nawijn, Martijn
    Li, Yang
    Chen, Wei
    Bont, Louis
    Bonnelykke, Klaus
    Celedon, Juan
    Koppelman, Gerard
    Xu, Cheng-Jian
    EUROPEAN RESPIRATORY JOURNAL, 2021, 58
  • [3] Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
    Torres-Martos, Alvaro
    Bustos-Aibar, Mireia
    Ramirez-Mena, Alberto
    Camara-Sanchez, Sofia
    Anguita-Ruiz, Augusto
    Alcala, Rafael
    Aguilera, Concepcion M.
    Alcala-Fdez, Jesus
    GENES, 2023, 14 (02)
  • [4] DATA PRE-PROCESSING APPROACHES IN PREDICTIVE MACHINE LEARNING OBSERVATIONAL STUDIES
    Friedman, H. S.
    Navaratnam, P.
    Kakehi, S.
    Ray, S.
    Hill, N.
    Kim, I
    Gricar, J.
    VALUE IN HEALTH, 2023, 26 (06) : S284 - S284
  • [5] Machine learning for the analysis of multi-omics data
    Sun, Yanni
    METHODS, 2021, 189 : 1 - 2
  • [6] Prediction of Composite Clinical Outcomes for Childhood Neuroblastoma Using Multi-Omics Data and Machine Learning
    Wang, Panru
    Zhang, Junying
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2025, 26 (01)
  • [7] Using machine learning approaches for multi-omics data analysis: A review
    Reel, Parminder S.
    Reel, Smarti
    Pearson, Ewan
    Trucco, Emanuele
    Jefferson, Emily
    BIOTECHNOLOGY ADVANCES, 2021, 49
  • [8] Methodology for Good Machine Learning with Multi-Omics Data
    Coroller, Thibaud
    Sahiner, Berkman
    Amatya, Anup
    Gossmann, Alexej
    Karagiannis, Konstantinos
    Moloney, Conor
    Samala, Ravi K.
    Santana-Quintero, Luis
    Solovieff, Nadia
    Wang, Craig
    Amiri-Kordestani, Laleh
    Cao, Qian
    Cha, Kenny H.
    Charlab, Rosane
    Cross, Frank H.
    Hu, Tingting
    Huang, Ruihao
    Kraft, Jeffrey
    Krusche, Peter
    Li, Yutong
    Li, Zheng
    Mazo, Ilya
    Paul, Rahul
    Schnakenberg, Susan
    Serra, Paolo
    Smith, Sean
    Song, Chi
    Su, Fei
    Tiwari, Mohit
    Vechery, Colin
    Xiong, Xin
    Zarate, Juan Pablo
    Zhu, Hao
    Chakravartty, Arunava
    Liu, Qi
    Ohlssen, David
    Petrick, Nicholas
    Schneider, Julie A.
    Walderhaug, Mark
    Zuber, Emmanuel
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2024, 115 (04) : 745 - 757
  • [9] Machine learning for multi-omics data integration in cancer
    Cai, Zhaoxiang
    Poulos, Rebecca C.
    Liu, Jia
    Zhong, Qing
    ISCIENCE, 2022, 25 (02)
  • [10] A Multi-purpose Data Pre-processing Framework using Machine Learning for Enterprise Data Models
    Ramana, Venkata B.
    Narsimha, G.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 646 - 656