Formal semantics and high performance in declarative machine learning using Datalog

被引:4
作者
Wang, Jin [1 ]
Wu, Jiacheng [2 ]
Li, Mingda [1 ]
Gu, Jiaqi [1 ]
Das, Ariyam [1 ]
Zaniolo, Carlo [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
Datalog; Declarative machine learning; Apache spark; Scalability; COMPRESSED LINEAR ALGEBRA; SCALING-UP; ANALYTICS; OPTIMIZATION; AGGREGATION; SOCIALITE; SYSTEMS; POWER;
D O I
10.1007/s00778-021-00665-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.
引用
收藏
页码:859 / 881
页数:23
相关论文
共 50 条
  • [31] ENHANCING CIRCULAR MICROSTRIP PATCH ANTENNA PERFORMANCE USING MACHINE LEARNING MODELS
    Jain, Rachit
    Thakare, Vandana Vikas
    Singhal, P. K.
    [J]. FACTA UNIVERSITATIS-SERIES ELECTRONICS AND ENERGETICS, 2023, 36 (04) : 589 - 600
  • [32] Classifying muscle performance of junior endurance and power athletes using machine learning
    Sulaiman, Maisarah
    Azaman, Aizreena
    Salleh, Noor Aimie
    As'ari, Muhammad Amir
    Zulkapri, Izwyn
    [J]. INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2024, 45 (04) : 337 - 353
  • [33] Improving the Performance of Automated Optical Inspection (AOI) Using Machine Learning Classifiers
    Reshadat, Vahideh
    Kapteijns, Rick A. J. W.
    [J]. PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE): DATA AND SOFTWARE ENGINEERING FOR SUPPORTING SUSTAINABLE DEVELOPMENT GOALS, 2021,
  • [34] Analyzing electric vehicle battery health performance using supervised machine learning
    Das, Kaushik
    Kumar, Roushan
    Krishna, Anurup
    [J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2024, 189
  • [35] Using machine learning algorithms to predict cast blasting performance in surface mining
    Rai, Sheo Shankar
    Murthy, V. M. S. R.
    Kumar, Rahul
    Maniteja, Mujigela
    Singh, Ashok Kumar
    [J]. MINING TECHNOLOGY-TRANSACTIONS OF THE INSTITUTIONS OF MINING AND METALLURGY, 2022, 131 (04) : 191 - 209
  • [36] Performance evaluation of intrusion detection based on machine learning using Apache Spark
    Belouch, Mustapha
    El Hadaj, Salah
    Idhammad, Mohamed
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 : 1 - 6
  • [37] Prediction of IC engine performance and emission parameters using machine learning: A review
    Karunamurthy, K.
    Janvekar, Ayub Ahmed
    Palaniappan, P. L.
    Adhitya, V.
    Lokeswar, T. T. K.
    Harish, J.
    [J]. JOURNAL OF THERMAL ANALYSIS AND CALORIMETRY, 2023, 148 (09) : 3155 - 3177
  • [38] Modeling and optimization of biodiesel engine performance using advanced machine learning methods
    Wong, Ka In
    Wong, Pak Kin
    Cheung, Chun Shun
    Vong, Chi Man
    [J]. ENERGY, 2013, 55 : 519 - 528
  • [39] Optimizing the LoRa network performance for industrial scenario using a machine learning approach
    Kaur, Gagandeep
    Gupta, Sindhu Hak
    Kaur, Harleen
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [40] Modelling the performance of EPB shield tunnelling using machine and deep learning algorithms
    Lin, Song-Shun
    Shen, Shui-Long
    Zhang, Ning
    Zhou, Annan
    [J]. GEOSCIENCE FRONTIERS, 2021, 12 (05)