Effortless Locality on Data Systems Using Relational Fabric

被引:0
|
作者
Papon, Tarikul Islam [1 ]
Mun, Ju Hyoung [1 ]
Karatsenidis, Konstantinos [1 ]
Roozkhosh, Shahin [1 ]
Hoornaert, Denis [2 ]
Sanaullah, Ahmed [3 ]
Drepper, Ulrich [3 ]
Mancuso, Renato [1 ]
Athanassoulis, Manos [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Tech Univ Munich, D-80333 Munich, Germany
[3] Red Hat, Raleigh, NC 27601 USA
基金
美国国家科学基金会;
关键词
Layout; Fabrics; Hardware; Data systems; Query processing; Memory management; Costs; HTAP; data layout; FPGA; near-data processing; FEATURE-SELECTION; ATTRIBUTE REDUCTION; ROUGH-SET;
D O I
10.1109/TKDE.2024.3386827
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decade-long journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: "What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?" To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.
引用
收藏
页码:7410 / 7422
页数:13
相关论文
共 50 条
  • [41] Xlight, An Efficient Relational Schema To Store And Query XML Data
    Zafari, Hasan
    Hasani, Keramat
    Shiri, M. Ebrahim
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA STORAGE AND DATA ENGINEERING (DSDE 2010), 2010, : 254 - 257
  • [42] DIFF: a relational interface for large-scale data explanation
    Firas Abuzaid
    Peter Kraft
    Sahaana Suri
    Edward Gan
    Eric Xu
    Atul Shenoy
    Asvin Ananthanarayan
    John Sheu
    Erik Meijer
    Xi Wu
    Jeff Naughton
    Peter Bailis
    Matei Zaharia
    The VLDB Journal, 2021, 30 : 45 - 70
  • [43] DIFF: a relational interface for large-scale data explanation
    Abuzaid, Firas
    Kraft, Peter
    Suri, Sahaana
    Gan, Edward
    Xu, Eric
    Shenoy, Atul
    Ananthanarayan, Asvin
    Sheu, John
    Meijer, Erik
    Wu, Xi
    Naughton, Jeff
    Bailis, Peter
    Zaharia, Matei
    VLDB JOURNAL, 2021, 30 (01) : 45 - 70
  • [44] A parallel algorithm for data cleansing in incomplete information systems using MapReduce
    Chen, Fei
    Jiang, Lin
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 273 - 277
  • [45] SLID: Exploiting Spatial Locality in Input Data as a Computational Reuse Method for Efficient CNN
    Alantali, Fatmah
    Halawani, Yasmin
    Mohammad, Baker
    Al-Qutayri, Mahmoud
    IEEE ACCESS, 2021, 9 : 57179 - 57187
  • [46] SHRDIS: A Semantic-based Heterogeneous Relational Data Integration System
    Wang, Jinpeng
    Zhang, Yafei
    Lu, Jianjiang
    Miao, Zhuang
    NANOTECHNOLOGY AND COMPUTER ENGINEERING, 2010, 121-122 : 335 - 340
  • [47] A Study on Mechanization, Data, and Insurance Challenges for Agriculture Based on Hyperledger Fabric
    Hu, Zhihua
    Kim, Bongjun
    Jeong, Junho
    IEEE ACCESS, 2024, 12 : 144855 - 144869
  • [48] Feature Selection for Complex Systems Monitoring: an Application using Data Fusion
    Le Moal, Gwenole
    Moraru, George
    Veron, Philippe
    Rabate, Patrice
    Douilly, Marc
    2012 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS, COMPUTING AND CONTROL APPLICATIONS (CCCA), 2012,
  • [49] Data classification using rough set and bioinspired computing in healthcare applications-an extensive review
    Kumari, Nancy
    Acharjya, D. P.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) : 13479 - 13505
  • [50] Similarity Group-by Operators for Multi-Dimensional Relational Data
    Tang, Mingjie
    Tahboub, Ruby Y.
    Aref, Walid G.
    Atallah, Mikhail J.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Silva, Yasin N.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 510 - 523