ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation

被引:29
|
作者
Kara, Kaan [1 ]
Eguro, Ken [2 ]
Zhang, Ce [1 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Syst Grp, Zurich, Switzerland
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 12卷 / 04期
关键词
REAL-TIME; SCALE;
D O I
10.14778/3297753.3297756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.
引用
收藏
页码:348 / 361
页数:14
相关论文
共 50 条
  • [21] Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement
    Psaroudakis, Iraklis
    Scheuer, Tobias
    May, Norman
    Sellami, Abdelkader
    Ailamaki, Anastasia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1442 - 1453
  • [22] An investigation of self-interstitial diffusion in α-zirconium by an on-the-fly machine learning force field
    Shi, Tan
    Liu, Wenlong
    Zhang, Chen
    Lyu, Sixin
    Sun, Zhipeng
    Peng, Qing
    Li, Yuanming
    Meng, Fanqiang
    Tang, Chuanbao
    Lu, Chenyang
    AIP ADVANCES, 2024, 14 (05)
  • [23] Realistic On-the-fly Outcomes of Planetary Collisions: Machine Learning Applied to Simulations of Giant Impacts
    Cambioni, Saverio
    Asphaug, Erik
    Emsenhuber, Alexandre
    Gabriel, Travis S. J.
    Furfaro, Roberto
    Schwartz, Stephen R.
    ASTROPHYSICAL JOURNAL, 2019, 875 (01):
  • [24] Data Column Prediction: Experiment in Automated Column Tagging Using Machine Learning
    McCabe, S.
    Cropp, B.
    Coles, J.
    Del Vecchio, J.
    Ekstrum, J.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS, 2019, 11006
  • [25] On-the-fly construction of surrogate constitutive models for concurrent multiscale mechanical analysis through probabilistic machine learning
    Rocha I.B.C.M.
    Kerfriden P.
    van der Meer F.P.
    Journal of Computational Physics: X, 2021, 9
  • [26] On-the-fly machine learning potential accelerated accurate prediction of lattice thermal conductivity of metastable silicon crystals
    Cui, Chunfeng
    Zhang, Yuwen
    Ouyang, Tao
    Chen, Mingxing
    Tang, Chao
    Chen, Qiao
    He, Chaoyu
    Li, Jin
    Zhong, Jianxin
    PHYSICAL REVIEW MATERIALS, 2023, 7 (03)
  • [27] On-the-fly interpretable machine learning for rapid discovery of two-dimensional ferromagnets with high Curie temperature
    Lu, Shuaihua
    Zhou, Qionghua
    Guo, Yilv
    Wang, Jinlan
    CHEM, 2022, 8 (03): : 769 - +
  • [28] Accessing complex reconstructed material structures with hybrid global optimization accelerated via on-the-fly machine learning
    Shi, Xiangcheng
    Cheng, Dongfang
    Zhao, Ran
    Zhang, Gong
    Wu, Shican
    Zhen, Shiyu
    Zhao, Zhi-Jian
    Gong, Jinlong
    CHEMICAL SCIENCE, 2023, 14 (33) : 8777 - 8784
  • [29] Leveraging Big Data and Machine Learning for Digital Transformation
    Huang, Jingwei
    JOURNAL OF INTEGRATED DESIGN & PROCESS SCIENCE, 2019, 23 (03) : 1 - 3
  • [30] On-the-Fly Machine Learning Force Field Study of Liquid-Al/Solid-TiB2 Interfaces
    Liu, Wenting
    Zhang, Guicheng
    Hu, Tao
    Shuai, Sansan
    Chen, Chaoyue
    Xu, Songzhe
    Ren, Wei
    Wang, Jiang
    Ren, Zhongming
    ACS APPLIED MATERIALS & INTERFACES, 2024, 16 (34) : 45754 - 45762