Amalur: The Convergence of Data Integration and Machine Learning

被引:0
作者
Li, Ziyu [1 ]
Sun, Wenbo [1 ]
Zhan, Danning [1 ]
Kang, Yan [2 ]
Chen, Lydia [3 ,4 ]
Bozzon, Alessandro [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Dept Software Technol, NL-2628 CD Delft, Netherlands
[2] WeBank, Shenzhen 518052, Peoples R China
[3] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
[4] Delft Univ Technol, NL-2628 CD Delft, Netherlands
基金
荷兰研究理事会;
关键词
Metadata; Data integration; Training; Federated learning; Data privacy; Soft sensors; Training data; Machine learning; data integration; federated learning;
D O I
10.1109/TKDE.2024.3357389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.
引用
收藏
页码:7353 / 7367
页数:15
相关论文
共 50 条
  • [21] Healthcare data integration using machine learning: A case study evaluation with health information-seeking behavior databases
    Mirzaei, Ardalan
    Aslani, Parisa
    Schneider, Carl R.
    [J]. RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY, 2022, 18 (12) : 4144 - 4149
  • [22] Privacy-Preserving Machine Learning [Cryptography]
    Kerschbaum, Florian
    Lukas, Nils
    [J]. IEEE SECURITY & PRIVACY, 2023, 21 (06) : 90 - 94
  • [23] Crawler intelligence with Machine Learning and Data Mining integration.
    Darshakar, Abhiraj
    [J]. 2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,
  • [24] SocInf: Membership Inference Attacks on Social Media Health Data With Machine Learning
    Liu, Gaoyang
    Wang, Chen
    Peng, Kai
    Huang, Haojun
    Li, Yutong
    Cheng, Wenqing
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (05) : 907 - 921
  • [25] Privacy Preserving Machine Learning With Federated Personalized Learning in Artificially Generated Environment
    Hosain, Md. Tanzib
    Abir, Mushfiqur Rahman
    Rahat, Md. Yeasin
    Mridha, M. F.
    Mukta, Saddam Hossain
    [J]. IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 694 - 704
  • [26] Demystifying Membership Inference Attacks in Machine Learning as a Service
    Truex, Stacey
    Liu, Ling
    Gursoy, Mehmet Emre
    Yu, Lei
    Wei, Wenqi
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2021, 14 (06) : 2073 - 2089
  • [27] Privacy-preserving data mining and machine learning in healthcare: Applications, challenges, and solutions
    Naresh, Vankamamidi S.
    Thamarai, Muthusamy
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (02)
  • [28] Integration of federated machine learning and blockchain for the provision of secure big data analytics for Internet of Things
    Unal, Devrim
    Hammoudeh, Mohammad
    Khan, Muhammad Asif
    Abuarqoub, Abdelrahman
    Epiphaniou, Gregory
    Hamila, Ridha
    [J]. COMPUTERS & SECURITY, 2021, 109
  • [29] Machine Learning and Integrative Analysis of Biomedical Big Data
    Mirza, Bilal
    Wang, Wei
    Wang, Jie
    Choi, Howard
    Chung, Neo Christopher
    Ping, Peipei
    [J]. GENES, 2019, 10 (02)
  • [30] Drug repositioning: a machine-learning approach through data integration
    Francesco Napolitano
    Yan Zhao
    Vânia M Moreira
    Roberto Tagliaferri
    Juha Kere
    Mauro D’Amato
    Dario Greco
    [J]. Journal of Cheminformatics, 5