Data Integration Challenges for Machine Learning in Precision Medicine

被引:52
作者
Martinez-Garcia, Mireya [1 ]
Hernandez-Lemus, Enrique [2 ,3 ]
机构
[1] Natl Inst Cardiol Ignacio Chavez, Clin Res Div, Mexico City, DF, Mexico
[2] Natl Inst Gen Med INMEGEN, Computat Gen Div, Mexico City, DF, Mexico
[3] Univ Nacl Autnoma Mexico, Ctr Complex Sci, Mexico City, DF, Mexico
关键词
precision medicine; machine learning; data integration; meta-data mining; computational intelligence; BIG DATA ANALYTICS; MICROARRAY EXPERIMENT MIAME; HEALTH-CARE; UK BIOBANK; PERSONALIZED MEDICINE; GENE ONTOLOGY; ARTIFICIAL-INTELLIGENCE; MINIMUM INFORMATION; METADATA CHECKLIST; VARIABLE SELECTION;
D O I
10.3389/fmed.2021.784455
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
引用
收藏
页数:21
相关论文
共 334 条
[1]  
Abrahams Edward, 2009, J Diabetes Sci Technol, V3, P680
[2]  
Abugessaisa I., 2008, International Journal of Public Information Systems, V1, P59
[3]   Personalized Medicine and the Power of Electronic Health Records [J].
Abul-Husn, Noura S. ;
Kenny, Eimear E. .
CELL, 2019, 177 (01) :58-69
[4]   The GTEx Consortium atlas of genetic regulatory effects across human tissues [J].
Aguet, Francois ;
Barbeira, Alvaro N. ;
Bonazzola, Rodrigo ;
Brown, Andrew ;
Castel, Stephane E. ;
Jo, Brian ;
Kasela, Silva ;
Kim-Hellmuth, Sarah ;
Liang, Yanyu ;
Parsana, Princy ;
Flynn, Elise ;
Fresard, Laure ;
Gamazon, Eric R. ;
Hamel, Andrew R. ;
He, Yuan ;
Hormozdiari, Farhad ;
Mohammadi, Pejman ;
Munoz-Aguirre, Manuel ;
Ardlie, Kristin G. ;
Battle, Alexis ;
Bonazzola, Rodrigo ;
Brown, Christopher D. ;
Cox, Nancy ;
Dermitzakis, Emmanouil T. ;
Engelhardt, Barbara E. ;
Garrido-Martin, Diego ;
Gay, Nicole R. ;
Getz, Gad ;
Guigo, Roderic ;
Hamel, Andrew R. ;
Handsaker, Robert E. ;
He, Yuan ;
Hoffman, Paul J. ;
Hormozdiari, Farhad ;
Im, Hae Kyung ;
Jo, Brian ;
Kasela, Silva ;
Kashin, Seva ;
Kim-Hellmuth, Sarah ;
Kwong, Alan ;
Lappalainen, Tuuli ;
Li, Xiao ;
Liang, Yanyu ;
MacArthur, Daniel G. ;
Mohammadi, Pejman ;
Montgomery, Stephen B. ;
Munoz-Aguirre, Manuel ;
Rouhana, John M. ;
Hormozdiari, Farhad ;
Im, Hae Kyung .
SCIENCE, 2020, 369 (6509) :1318-1330
[5]   Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework [J].
Ahmad, Tanveer ;
Ahmed, Nauman ;
Al-Ars, Zaid ;
Hofstee, H. Peter .
BMC GENOMICS, 2020, 21 (Suppl 10)
[6]   Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine [J].
Ahmed, Zeeshan ;
Mohamed, Khalid ;
Zeeshan, Saman ;
Dong, Xinqi .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2020,
[7]   Challenges and Opportunities in Mining Neuroscience Data [J].
Akil, Huda ;
Martone, Maryann E. ;
Van Essen, David C. .
SCIENCE, 2011, 331 (6018) :708-712
[8]   FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes [J].
Al-Shahrour, F ;
Díaz-Uriarte, R ;
Dopazo, J .
BIOINFORMATICS, 2004, 20 (04) :578-580
[9]   Integrating machine learning and multiscale modeling-perspectives, challenges, and opportunities in the biologica biomedical, and behavioral sciences [J].
Alber, Mark ;
Tepole, Adrian Buganza ;
Cannon, William R. ;
De, Suvranu ;
Dura-Bernal, Salvador ;
Garikipati, Krishna ;
Karniadakis, George ;
Lytton, William W. ;
Perdikaris, Paris ;
Petzold, Linda ;
Kuhl, Ellen .
NPJ DIGITAL MEDICINE, 2019, 2 (1)
[10]  
Alfayez R, 2020, ELECTRON J E-LEARN, V18, P356, DOI [10.34190/EJEL.20.18.4.008, DOI 10.34190/EJEL.20.18.4.008]