Accelerating machine learning queries with linear algebra query processing

被引:0
作者
Sun, Wenbo [1 ]
Katsifodimos, Asterios [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Fac EEMCS, NL-2628 ZE Delft, Netherlands
关键词
Database; Query optimization; Machine learning; Operator fusion;
D O I
10.1007/s10619-024-07451-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines. In this paper, we propose an operator fusion method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate linear algebra query processing and operator fusion utilizing the widely-used Star Schema and TPC-DI benchmarks. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.
引用
收藏
页数:41
相关论文
共 29 条
[1]  
Amossen R.R., 2009, ICDT, P121, DOI [10.1145/1514894.1514909, DOI 10.1145/1514894.1514909]
[2]  
[Anonymous], 2018, Transaction Processing Performance Council TPC Benchmark H
[3]  
[Anonymous], 2013, The Data Warehouse Toolkit
[4]  
[Anonymous], 2022, Rapidsai: cuDF
[5]  
[Anonymous], 2020, BLAZINGSQL
[6]  
Balkesen C, 2013, PROC INT CONF DATA, P362, DOI 10.1109/ICDE.2013.6544839
[7]  
Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
[8]   Efficient Construction of Nonlinear Models over Normalized Data [J].
Cheng, Zhaoyue ;
Koudas, Nick ;
Zhang, Zhe ;
Yu, Xiaohui .
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, :1140-1151
[9]   Fast Join Project Query Evaluation using Matrix Multiplication [J].
Deep, Shaleen ;
Hu, Xiao ;
Koutris, Paraschos .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :1213-1223
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232