Amalur: The Convergence of Data Integration and Machine Learning

被引:0
|
作者
Li, Ziyu [1 ]
Sun, Wenbo [1 ]
Zhan, Danning [1 ]
Kang, Yan [2 ]
Chen, Lydia [3 ,4 ]
Bozzon, Alessandro [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Dept Software Technol, NL-2628 CD Delft, Netherlands
[2] WeBank, Shenzhen 518052, Peoples R China
[3] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
[4] Delft Univ Technol, NL-2628 CD Delft, Netherlands
基金
荷兰研究理事会;
关键词
Metadata; Data integration; Training; Federated learning; Data privacy; Soft sensors; Training data; Machine learning; data integration; federated learning;
D O I
10.1109/TKDE.2024.3357389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.
引用
收藏
页码:7353 / 7367
页数:15
相关论文
共 50 条
  • [1] Data Integration using Machine Learning
    Birgersson, Marcus
    Hansson, Gustav
    Franke, Ulrik
    2016 IEEE 20TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING WORKSHOP (EDOCW), 2016, : 313 - 322
  • [2] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1645 - 1650
  • [3] Data Integration in Machine Learning
    Li, Yifeng
    Ngom, Alioune
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1665 - 1671
  • [4] Machine learning for data integration in human gut microbiome
    Peishun Li
    Hao Luo
    Boyang Ji
    Jens Nielsen
    Microbial Cell Factories, 21
  • [5] Data Integration Challenges for Machine Learning in Precision Medicine
    Martinez-Garcia, Mireya
    Hernandez-Lemus, Enrique
    FRONTIERS IN MEDICINE, 2022, 8
  • [6] Machine learning for data integration in human gut microbiome
    Li, Peishun
    Luo, Hao
    Ji, Boyang
    Nielsen, Jens
    MICROBIAL CELL FACTORIES, 2022, 21 (01)
  • [7] Data Privacy and Trustworthy Machine Learning
    Strobel, Martin
    Shokri, Reza
    IEEE SECURITY & PRIVACY, 2022, 20 (05) : 44 - 49
  • [8] Machine Learning for Medical Data Integration
    Mueller, Armin
    Christmann, Lara-Sophie
    Kohler, Severin
    Eils, Roland
    Prasser, Fabian
    CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 691 - 695
  • [9] Omics data integration in computational biology viewed through the prism of machine learning paradigms
    Fouche, Aziz
    Zinovyev, Andrei
    FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [10] Toward Incentive With Privacy Preserving Machine Learning as a Service for Crowdsensed Data Trading
    Li, Kunchang
    Shi, Yinfeng
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (22): : 36494 - 36507