Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study

被引:2
作者
Abbasi, Maryam [1 ]
Bernardo, Marco V. [2 ,3 ]
Vaz, Paulo [3 ,4 ]
Silva, Jose [3 ,4 ]
Martins, Pedro [3 ,4 ]
机构
[1] Polytech Coimbra, Appl Res Inst, P-3045093 Coimbra, Portugal
[2] Inst Telecomunicacoes, P-6201001 Covilha, Portugal
[3] Polytech Viseu, Dept Informat, P-3504510 Viseu, Portugal
[4] Polytech Viseu, Res Ctr Digital Serv CISeD, P-3504510 Viseu, Portugal
关键词
machine learning integration; database optimization; query performance; dynamic workload management; PostgreSQL; real-time system tuning;
D O I
10.3390/info15090574
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, heterogeneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments.
引用
收藏
页数:25
相关论文
共 14 条
[1]  
Heitz J, 2019, Arxiv, DOI arXiv:1911.11689
[2]   Cardinality Estimation of Approximate Substring Queries using Deep Learning [J].
Kwon, Suyong ;
Jung, Woohwan ;
Shim, Kyuseok .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11) :3145-3157
[3]   Machine Learning for Databases [J].
Li, Guoliang ;
Zhou, Xuanhe ;
Cao, Lei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12) :3190-3193
[4]   Benchmarking Learned Indexes [J].
Marcus, Ryan ;
Kipf, Andreas ;
van Renen, Alexander ;
Stoian, Mihail ;
Misra, Sanchit ;
Kemper, Alfons ;
Neumann, Thomas ;
Kraska, Tim .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01) :1-13
[5]   Data Model for Residential and Commercial Buildings. Load Flexibility Assessment in Smart Cities [J].
Oprea, Simona-Vasilica ;
Bara, Adela ;
Marales, Razvan Cristian ;
Florescu, Margareta-Stela .
SUSTAINABILITY, 2021, 13 (04) :1-20
[6]   Machine Learning Algorithms for Short-Term Load Forecast in Residential Buildings Using Smart Meters, Sensors and Big Data Solutions [J].
Oprea, Simona-Vasilica ;
Bara, Adela .
IEEE ACCESS, 2019, 7 :177874-177889
[7]   Pushing ML Predictions Into DBMSs [J].
Paganelli, Matteo ;
Sottovia, Paolo ;
Park, Kwanghyun ;
Interlandi, Matteo ;
Guerra, Francesco .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) :10295-10308
[8]   A Novel Optimized Case-Based Reasoning Approach With K-Means Clustering and Genetic Algorithm for Predicting Multi-Class Workload Characterization in Autonomic Database and Data Warehouse System [J].
Shaheen, Nusrat ;
Raza, Basit ;
Shahid, Ahmad Raza ;
Alquhayz, Hani .
IEEE ACCESS, 2020, 8 :105713-105727
[9]  
Siddiqui T., 2023, arXiv, DOI [10.1145/3641832.3641836, DOI 10.1145/3641832.3641836]
[10]   iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases [J].
Tan, Jian ;
Zhang, Rui ;
Zhang, Tieying ;
Li, Feifei ;
Chen, Jie ;
Zheng, Qixing ;
Zhang, Ping ;
Qiao, Honglin ;
Shi, Yue ;
Cao, Wei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (10) :1221-1234