Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities

被引:327
作者
Zitnik, Marinka [1 ]
Nguyen, Francis [2 ,3 ]
Wang, Bo [4 ]
Leskovec, Jure [1 ,5 ]
Goldenberg, Anna [6 ,7 ,8 ]
Hoffman, Michael M. [2 ,3 ,7 ,8 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Univ Toronto, Dept Med Biophys, Toronto, ON, Canada
[3] Princess Margaret Canc Ctr, Toronto, ON, Canada
[4] Hikvis Res Inst, Santa Clara, CA USA
[5] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
[6] SickKids Res Inst, Genet & Genome Biol, Toronto, ON, Canada
[7] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[8] Vector Inst, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Computational biology; Personalized medicine; Systems biology; Heterogeneous data; Machine learning; DRUG-DRUG INTERACTION; GENOME-WIDE ASSOCIATION; DNA METHYLATION; DATA FUSION; TRANSCRIPTION FACTORS; CHROMATIN-STATE; CHIP-SEQ; PROBABILISTIC FUNCTIONS; MULTICELLULAR FUNCTION; HETEROGENEOUS NETWORK;
D O I
10.1016/j.inffus.2018.09.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
引用
收藏
页码:71 / 91
页数:21
相关论文
共 50 条
  • [41] Ten quick tips for machine learning in computational biology
    Davide Chicco
    BioData Mining, 10
  • [42] Electrical conductivity of binary magnesium alloys: Integrating first-principles and machine learning studies
    Liang, Tingting
    Yuan, Yuan
    Chen, Tao
    Wang, Jun
    Zhang, Ligang
    Wu, Liang
    Wang, Yangyang
    Tang, Aitao
    Chen, Xianhua
    Pan, Fusheng
    MATERIALS TODAY COMMUNICATIONS, 2024, 40
  • [43] Ten quick tips for machine learning in computational biology
    Chicco, Davide
    BIODATA MINING, 2017, 10
  • [44] Quality assessment of traditional Chinese medicine based on data fusion combined with machine learning: A review
    Ding, Rong
    Yu, Lianhui
    Wang, Chenghui
    Zhong, Shihong
    Gu, Rui
    CRITICAL REVIEWS IN ANALYTICAL CHEMISTRY, 2024, 54 (07) : 2618 - 2635
  • [45] Integrating machine learning and geospatial data analysis for comprehensive flood hazard assessment
    Singha, Chiranjit
    Rana, Vikas Kumar
    Pham, Quoc Bao
    Nguyen, Duc C.
    Lupikasza, Ewa
    Environmental Science and Pollution Research, 2024, 31 (35) : 48497 - 48522
  • [46] Integrating gridded precipitation data and machine learning for enhancing drought prediction in Iraq
    Suliman, Ali H. Ahmed
    Awchi, Taymoor A.
    Shahid, Shamsuddin
    MODELING EARTH SYSTEMS AND ENVIRONMENT, 2025, 11 (03)
  • [47] Integrating machine learning and open data into social Chatbot for filtering information rumor
    Hsu, I-Ching
    Chang, Chun-Cheng
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (01) : 1023 - 1037
  • [48] Integrating machine learning and open data into social Chatbot for filtering information rumor
    I-Ching Hsu
    Chun-Cheng Chang
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 1023 - 1037
  • [49] Integrating Iterative Machine Teaching and Active Learning into the Machine Learning Loop
    Mosqueira-Rey, Eduardo
    Alonso-Rios, David
    Baamonde-Lozano, Andres
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 553 - 562
  • [50] Integrating germline and somatic data towards a personalized cancer medicine
    Angel Pujana, Miguel
    TRENDS IN MOLECULAR MEDICINE, 2014, 20 (08) : 413 - 415