Formal semantics and high performance in declarative machine learning using Datalog

被引:4
|
作者
Wang, Jin [1 ]
Wu, Jiacheng [2 ]
Li, Mingda [1 ]
Gu, Jiaqi [1 ]
Das, Ariyam [1 ]
Zaniolo, Carlo [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
Datalog; Declarative machine learning; Apache spark; Scalability; COMPRESSED LINEAR ALGEBRA; SCALING-UP; ANALYTICS; OPTIMIZATION; AGGREGATION; SOCIALITE; SYSTEMS; POWER;
D O I
10.1007/s00778-021-00665-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.
引用
收藏
页码:859 / 881
页数:23
相关论文
共 50 条
  • [1] Formal semantics and high performance in declarative machine learning using Datalog
    Jin Wang
    Jiacheng Wu
    Mingda Li
    Jiaqi Gu
    Ariyam Das
    Carlo Zaniolo
    The VLDB Journal, 2021, 30 : 859 - 881
  • [2] Declarative Data Serving: The Future of Machine Learning Inference on the Edge
    Shaowang, Ted
    Jain, Nilesh
    Matthews, Dennis D.
    Krishnan, Sanjay
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (11): : 2555 - 2562
  • [3] Prediction of high-performance concrete strength using machine learning with hierarchical regression
    Harith, Iman Kattoof
    Nadir, Wissam
    Salah, Mustafa S.
    Hussien, Mohammed L.
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (05) : 4911 - 4922
  • [4] Assessing Ships' Environmental Performance Using Machine Learning
    Skarlatos, Kyriakos
    Fousteris, Andreas
    Georgakellos, Dimitrios
    Economou, Polychronis
    Bersimis, Sotirios
    ENERGIES, 2023, 16 (06)
  • [5] Prediction of compressive strength of high-performance concrete (HPC) using machine learning algorithms
    Imran, Muhammad
    Raza, Ali
    Touqeer, Muhammad
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (03) : 1881 - 1894
  • [6] Integrated learning pathways in higher education: A framework enhanced with machine learning and semantics
    Iatrellis, Omiros
    Savvas, Ilias K.
    Kameas, Achilles
    Fitsilis, Panos
    EDUCATION AND INFORMATION TECHNOLOGIES, 2020, 25 (04) : 3109 - 3129
  • [7] Machine Learning for the Performance Assessment of High-Speed Links
    Trinchero, Riccardo
    Manfredi, Paolo
    Stievano, Igor S.
    Canavero, Flavio G.
    IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, 2018, 60 (06) : 1627 - 1634
  • [8] Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics
    Ding, Yahao
    Yang, Zhaohui
    Pham, Quoc-Viet
    Hu, Ye
    Zhang, Zhaoyang
    Shikh-Bahaei, Mohammad
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (05): : 7447 - 7473
  • [9] Optimization of reinforcement routing for wireless mesh network using machine learning and high-performance computing
    Singh, Ankita
    Prakash, Shiv
    Singh, Sudhakar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (15)
  • [10] Splitting tensile strength prediction of sustainable high-performance concrete using machine learning techniques
    Wu, Yanqi
    Zhou, Yisong
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2022, 29 (59) : 89198 - 89209