Effortless Locality on Data Systems Using Relational Fabric

被引:0
|
作者
Papon, Tarikul Islam [1 ]
Mun, Ju Hyoung [1 ]
Karatsenidis, Konstantinos [1 ]
Roozkhosh, Shahin [1 ]
Hoornaert, Denis [2 ]
Sanaullah, Ahmed [3 ]
Drepper, Ulrich [3 ]
Mancuso, Renato [1 ]
Athanassoulis, Manos [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Tech Univ Munich, D-80333 Munich, Germany
[3] Red Hat, Raleigh, NC 27601 USA
基金
美国国家科学基金会;
关键词
Layout; Fabrics; Hardware; Data systems; Query processing; Memory management; Costs; HTAP; data layout; FPGA; near-data processing; FEATURE-SELECTION; ATTRIBUTE REDUCTION; ROUGH-SET;
D O I
10.1109/TKDE.2024.3386827
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decade-long journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: "What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?" To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.
引用
收藏
页码:7410 / 7422
页数:13
相关论文
共 50 条
  • [21] A data locality methodology for matrix-matrix multiplication algorithm
    Alachiotis, Nicolaos
    Kelefouras, Vasileios I.
    Athanasiou, George S.
    Michail, Harris E.
    Kritikakou, Angeliki S.
    Goutis, Costas E.
    JOURNAL OF SUPERCOMPUTING, 2012, 59 (02) : 830 - 851
  • [22] DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality
    Wang, Jun
    Xiao, Qiangju
    Yin, Jiangling
    Shang, Pengju
    IEEE TRANSACTIONS ON MAGNETICS, 2013, 49 (06) : 2514 - 2520
  • [23] DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications with Interest Locality
    Shang, Pengju
    Xiao, Qiangju
    Wang, Jun
    2012 DIGEST ASIA-PACIFIC MAGNETIC RECORDING CONFERENCE (APMRC), 2012,
  • [24] Gene Selection Using Locality Sensitive Laplacian Score
    Liao, Bo
    Jiang, Yan
    Liang, Wei
    Zhu, Wen
    Cai, Lijun
    Cao, Zhi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (06) : 1146 - 1156
  • [25] Recent granular computing frameworks for mining relational data
    Honko, Piotr
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (04) : 2705 - 2742
  • [26] Effective path indexes for XML data on relational databases
    Min, JK
    Kim, J
    Lee, M
    7TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1355 - 1359
  • [27] ARDA: Automatic Relational Data Augmentation for Machine Learning
    Chepurko, Nadiia
    Marcus, Ryan
    Zgraggen, Emanuel
    Castro Fernandez, Raul
    Kraska, Tim
    Karger, David
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (09): : 1373 - 1387
  • [28] Improving Data Locality of Tasks by Executor Allocation in Spark Computing Environment
    Fu, Zhongming
    He, Mengsi
    Yi, Yang
    Tang, Zhuo
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (03) : 876 - 888
  • [29] Multicast Optimization for CLOS Fabric in Media Data Centers
    Latif, Ammar
    Kathail, Pradeep
    Vishwarupe, Sachin
    Dhesikan, Subha
    Khreishah, Abdallah
    Jararweh, Yaser
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2019, 16 (04): : 1855 - 1868
  • [30] Fine-Granular Computation and Data Layout Reorganization for Improving Locality
    Kandemir, Mahmut
    Tang, Xulong
    Kotra, Jagadish
    Karakoy, Mustafa
    2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,