Effortless Locality on Data Systems Using Relational Fabric

被引:0
|
作者
Papon, Tarikul Islam [1 ]
Mun, Ju Hyoung [1 ]
Karatsenidis, Konstantinos [1 ]
Roozkhosh, Shahin [1 ]
Hoornaert, Denis [2 ]
Sanaullah, Ahmed [3 ]
Drepper, Ulrich [3 ]
Mancuso, Renato [1 ]
Athanassoulis, Manos [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Tech Univ Munich, D-80333 Munich, Germany
[3] Red Hat, Raleigh, NC 27601 USA
基金
美国国家科学基金会;
关键词
Layout; Fabrics; Hardware; Data systems; Query processing; Memory management; Costs; HTAP; data layout; FPGA; near-data processing; FEATURE-SELECTION; ATTRIBUTE REDUCTION; ROUGH-SET;
D O I
10.1109/TKDE.2024.3386827
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decade-long journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: "What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?" To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.
引用
收藏
页码:7410 / 7422
页数:13
相关论文
共 50 条
  • [1] Trends in Data Locality Abstractions for HPC Systems
    Unat, Didem
    Dubey, Anshu
    Hoefler, Torsten
    Shalf, John
    Abraham, Mark
    Bianco, Mauro
    Chamberlain, Bradford L.
    Cledat, Romain
    Edwards, H. Carter
    Finkel, Hal
    Fuerlinger, Karl
    Hannig, Frank
    Jeannot, Emmanuel
    Kamil, Amir
    Keasler, Jeff
    Kelly, Paul H. J.
    Leung, Vitus
    Ltaief, Hatem
    Maruyama, Naoya
    Newburn, Chris J.
    Pericas, Miquel
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 3007 - 3020
  • [2] Storing and querying XML data using denormalized relational databases
    Balmin, A
    Papakonstantinou, Y
    VLDB JOURNAL, 2005, 14 (01) : 30 - 49
  • [3] Storing and querying XML data using denormalized relational databases
    Andrey Balmin
    Yannis Papakonstantinou
    The VLDB Journal, 2005, 14 : 30 - 49
  • [4] A Data Layout With Good Data Locality for Single-Machine Based Graph Engines
    Jo, Yong-Yeon
    Jang, Myung-Hwan
    Kim, Sang-Wook
    Park, Sunju
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 71 (08) : 1784 - 1793
  • [5] Query Processing over Data Warehouse using Relational Databases and NoSQL
    Carniel, Anderson Chaves
    Sa, Aried de Aguiar
    Porto Brisighello, Vinicius Henrique
    Ribeiro, Marcela Xavier
    Bueno, Renato
    Ciferri, Ricardo Rodrigues
    de Aguiar Ciferri, Cristina Dutra
    2012 XXXVIII CONFERENCIA LATINOAMERICANA EN INFORMATICA (CLEI), 2012,
  • [6] Textures and dynamic relational systems
    Ugur, Aysegul Altay
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (05) : 881 - 891
  • [7] Textures and dynamic relational systems
    Ayşegül Altay Uğur
    International Journal of Machine Learning and Cybernetics, 2018, 9 : 881 - 891
  • [8] Exploiting parallelism and data locality of systolic array applications using multi-ported FPGA
    Lee, H
    Lidicoat, A
    Flynn, MJ
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 229 - 235
  • [9] Integrating Cluster-Based Main-Memory Accelerators in Relational Data Warehouse Systems
    Knut Stolze
    Felix Beier
    Oliver Koeth
    Kai-Uwe Sattler
    Datenbank-Spektrum , 2011, 11 (2) : 101 - 110
  • [10] Compound approximation spaces for relational data
    Honko, Piotr
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 71 : 89 - 111