Compressed linear algebra for large-scale machine learning

被引:15
|
作者
Elgohary, Ahmed [2 ]
Boehm, Matthias [1 ]
Haas, Peter J. [1 ]
Reiss, Frederick R. [1 ]
Reinwald, Berthold [1 ]
机构
[1] IBM Res Almaden, San Jose, CA 95120 USA
[2] Univ Maryland, College Pk, MD 20742 USA
来源
VLDB JOURNAL | 2018年 / 27卷 / 05期
关键词
Machine learning; Large-scale; Declarative; Linear algebra; Lossless compression; DATABASE; FACTORIZATION; DB2;
D O I
10.1007/s00778-017-0478-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work-inspired by database compression and sparse matrix formats-on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to .
引用
收藏
页码:719 / 744
页数:26
相关论文
共 50 条
  • [1] Compressed Linear Algebra for Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 960 - 971
  • [2] Compressed linear algebra for large-scale machine learning
    Ahmed Elgohary
    Matthias Boehm
    Peter J. Haas
    Frederick R. Reiss
    Berthold Reinwald
    The VLDB Journal, 2018, 27 : 719 - 744
  • [3] Compressed Linear Algebra for Declarative Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    COMMUNICATIONS OF THE ACM, 2019, 62 (05) : 83 - 91
  • [4] Scaling Machine Learning via Compressed Linear Algebra
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    SIGMOD RECORD, 2017, 46 (01) : 42 - 49
  • [5] Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra
    Ives, Zachary G.
    SIGMOD RECORD, 2017, 46 (01) : 41 - 41
  • [6] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [7] Linear algebra software for large-scale accerlerated multicore computing
    Abdelfatah, A.
    Anzt, H.
    Dongarra, J.
    Gates, M.
    Haidar, A.
    Kurzak, J.
    Luszczek, P.
    Tomov, S.
    Yamazaki, I.
    YarKhan, A.
    ACTA NUMERICA, 2016, 25 : 1 - 160
  • [8] Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics
    Buono, Daniele
    Gunnels, John A.
    Que, Xinyu
    Checconi, Fabio
    Petrini, Fabrizio
    Tuan, Tai-Ching
    Long, Chris
    COMPUTER, 2015, 48 (08) : 26 - 34
  • [9] Large-scale distributed linear algebra with tensor processing units
    Lewis, Adam G. M.
    Beall, Jackson
    Ganahl, Martin
    Hauru, Markus
    Mallick, Shrestha Basu
    Vidal, Guifre
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (33)
  • [10] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789