Effortless Locality on Data Systems Using Relational Fabric

被引：0

作者：

Papon, Tarikul Islam ^{[1
]}

Mun, Ju Hyoung ^{[1
]}

Karatsenidis, Konstantinos ^{[1
]}

Roozkhosh, Shahin ^{[1
]}

Hoornaert, Denis ^{[2
]}

Sanaullah, Ahmed ^{[3
]}

Drepper, Ulrich ^{[3
]}

Mancuso, Renato ^{[1
]}

Athanassoulis, Manos ^{[1
]}

机构：

[1] Boston Univ, Boston, MA 02215 USA

[2] Tech Univ Munich, D-80333 Munich, Germany

[3] Red Hat, Raleigh, NC 27601 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 12期

基金：

美国国家科学基金会;

关键词：

Layout; Fabrics; Hardware; Data systems; Query processing; Memory management; Costs; HTAP; data layout; FPGA; near-data processing; FEATURE-SELECTION; ATTRIBUTE REDUCTION; ROUGH-SET;

D O I：

10.1109/TKDE.2024.3386827

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decade-long journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: "What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?" To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.

引用

页码：7410 / 7422

页数：13

共 50 条

[31] Deep Adversarial Data Augmentation for Fabric Defect Classification With Scarce Defect Data
Lu, Bingyu
Zhang, Meng
Huang, Biqing
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[32] The threshold algorithm: From middleware systems to the relational engine
Bruno, Nicolas
Wang, Hui
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (04) : 523 - 537
[33] Learning Models over Relational Data: A Brief Tutorial
Schleich, Maximilian
Olteanu, Dan
Abo-Khamis, Mahmoud
Ngo, Hung Q.
Nguyen, XuanLong
SCALABLE UNCERTAINTY MANAGEMENT, SUM 2019, 2019, 11940 : 423 - 432
[34] Predicting the fabric width of single jersey cotton knitted fabric using appropriate softy are
Bhuvaneshwarri, I
Tamilarasi, A.
INDUSTRIA TEXTILA, 2019, 70 (06): : 538 - 546
[35] k-Nearest Neighbour Using Ensemble Clustering Based on Feature Selection Approach to Learning Relational Data
Alfred, Rayner
Shin, Kung Ke
Sainin, Mohd Shamrie
On, Chin Kim
Pandiyan, Paulraj Murugesa
Ibrahim, Ag Asri Ag
ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 538 : 322 - 331
[36] Modeling and scheduling parallel data flow systems using structured systems of recurrence equations
Charot, F
Nyamsi, M
Quinton, P
Wagner, C
15TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, PROCEEDINGS, 2004, : 6 - 16
[37] LolliRAM: A Cross-Layer Design to Exploit Data Locality in Oblivious RAM
Wang, Yi
Chen, Weixuan
Wang, Xianhua
Mao, Rui
2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 1099 - 1104
[38] DIFF: A Relational Interface for Large-Scale Data Explanation
Abuzaid, Firas
Kraft, Peter
Suri, Sahaana
Gan, Edward
Xu, Eric
Shenoy, Atul
Ananthanarayan, Asvin
Sheu, John
Meijer, Erik
Wu, Xi
Naughton, Jeff
Bailis, Peter
Zaharia, Matei
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 12 (04): : 419 - 432
[39] An Efficient Query Matching Algorithm for Relational Data Semantic Cache
Ahmad, Munir
Qadir, Muhammad Abdul
Sanaullah, Muhammad
Bashir, Muhammad Farhan
2009 2ND INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND COMMUNICATION, 2009, : 379 - 384
[40] Exploiting Stream Request Locality to Improve Query Throughput of a Data Integration System
Lee, Rubao
Xu, Zhiwei
IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (10) : 1356 - 1368

← 1 2 3 4 5 →