Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities

被引:327
作者
Zitnik, Marinka [1 ]
Nguyen, Francis [2 ,3 ]
Wang, Bo [4 ]
Leskovec, Jure [1 ,5 ]
Goldenberg, Anna [6 ,7 ,8 ]
Hoffman, Michael M. [2 ,3 ,7 ,8 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Univ Toronto, Dept Med Biophys, Toronto, ON, Canada
[3] Princess Margaret Canc Ctr, Toronto, ON, Canada
[4] Hikvis Res Inst, Santa Clara, CA USA
[5] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
[6] SickKids Res Inst, Genet & Genome Biol, Toronto, ON, Canada
[7] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[8] Vector Inst, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Computational biology; Personalized medicine; Systems biology; Heterogeneous data; Machine learning; DRUG-DRUG INTERACTION; GENOME-WIDE ASSOCIATION; DNA METHYLATION; DATA FUSION; TRANSCRIPTION FACTORS; CHROMATIN-STATE; CHIP-SEQ; PROBABILISTIC FUNCTIONS; MULTICELLULAR FUNCTION; HETEROGENEOUS NETWORK;
D O I
10.1016/j.inffus.2018.09.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
引用
收藏
页码:71 / 91
页数:21
相关论文
共 50 条
  • [1] Machine learning in postgenomic biology and personalized medicine
    Ray, Animesh
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (02)
  • [2] Systems biology and data-driven machine learning-based models in personalized cardiovascular medicine
    Hueso, Miguel
    Rotllan, Noemi
    Escola-Gil, Joan Carles
    Vellido, Alfredo
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2023, 10
  • [3] Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine
    Arslan, Emre
    Schulz, Jonathan
    Rai, Kunal
    BIOCHIMICA ET BIOPHYSICA ACTA-REVIEWS ON CANCER, 2021, 1876 (02):
  • [4] Opportunities and obstacles for deep learning in biology and medicine
    Ching, Travers
    Himmelstein, Daniel S.
    Beaulieu-Jones, Brett K.
    Kalinin, Alexandr A.
    Do, Brian T.
    Way, Gregory P.
    Ferrero, Enrico
    Agapow, Paul-Michael
    Zietz, Michael
    Hoffman, Michael M.
    Xie, Wei
    Rosen, Gail L.
    Lengerich, Benjamin J.
    Israeli, Johnny
    Lanchantin, Jack
    Woloszynek, Stephen
    Carpenter, Anne E.
    Shrikumar, Avanti
    Xu, Jinbo
    Cofer, Evan M.
    Lavender, Christopher A.
    Turaga, Srinivas C.
    Alexandari, Amr M.
    Lu, Zhiyong
    Harris, David J.
    DeCaprio, Dave
    Qi, Yanjun
    Kundaje, Anshul
    Peng, Yifan
    Wiley, Laura K.
    Segler, Marwin H. S.
    Boca, Simina M.
    Swamidass, S. Joshua
    Huang, Austin
    Gitter, Anthony
    Greene, Casey S.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2018, 15 (141)
  • [5] Principles and Practice of Explainable Machine Learning
    Belle, Vaishak
    Papantonis, Ioannis
    FRONTIERS IN BIG DATA, 2021, 4
  • [6] Data Learning: Integrating Data Assimilation and Machine Learning
    Buizza, Caterina
    Casas, Cesar Quilodran
    Nadler, Philip
    Mack, Julian
    Marrone, Stefano
    Titus, Zainab
    Le Cornec, Clemence
    Heylen, Evelyn
    Dur, Tolga
    Ruiz, Luis Baca
    Heaney, Claire
    Lopez, Julio Amador Diaz
    Kumar, K. S. Sesh
    Arcucci, Rossella
    JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 58
  • [7] Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis
    Dhillon, Sarinder Kaur
    Ganggayah, Mogana Darshini
    Sinnadurai, Siamala
    Lio, Pietro
    Taib, Nur Aishah
    DIAGNOSTICS, 2022, 12 (10)
  • [8] Data Integration Challenges for Machine Learning in Precision Medicine
    Martinez-Garcia, Mireya
    Hernandez-Lemus, Enrique
    FRONTIERS IN MEDICINE, 2022, 8
  • [9] Integrating Machine Learning Models into the Linux Kernel: Opportunities and Challenges
    Gallego-Madrid, Jorge
    Bru-Santa, Irene
    Sanchez-Iborra, Ramon
    Skarmeta, Antonio
    MOBILE INTERNET SECURITY, MOBISEC 2023, 2024, 2095 : 209 - 219
  • [10] Integrating Machine Learning into Supply Chain Management: Challenges and Opportunities
    Falkner, Dominik
    Boegl, Michael
    Gattinger, Anna
    Stainko, Roman
    Zenisek, Jan
    Affenzeller, Michael
    5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023, 2024, 232 : 1779 - 1788