pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores

被引:2
作者
Sobczyk, Aleksandros [1 ,2 ]
Gallopoulos, Efstratios [3 ]
机构
[1] IBM Res Europe, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] Univ Patras, HPCLAB, Comp Engn & Informat Dept, Patras, Greece
来源
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE | 2022年 / 48卷 / 04期
关键词
Parallel algorithms; sparse data structures; sketching; column subset selection; regression; preconditioning; statistical leverage scores; MATRIX; APPROXIMATION; OPTIMIZATION; STANDARD;
D O I
10.1145/3555370
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on "tall-and-skinny" matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package(1) whose core is written in C++ and parallelized with OpenMP and that is compatiblewith standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.
引用
收藏
页数:27
相关论文
共 88 条
[1]  
Achlioptas D., 2001, P 20 ACM SIGMOD SIGA, P274
[2]  
Ailon N., 2006, STOC'06. Proceedings of the 38th Annual ACM Symposium on Theory of Computing, P557, DOI 10.1145/1132516.1132597
[3]   Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes [J].
Ailon, Nir ;
Liberty, Edo .
DISCRETE & COMPUTATIONAL GEOMETRY, 2009, 42 (04) :615-630
[4]  
Alaoui Ahmed, 2015, Advances in Neural Information Processing Systems, V28, P775
[5]  
Alman J, 2021, Disc Algorithms, P522
[6]  
[Anonymous], 2000, Journal of statistical software, DOI [DOI 10.18637/JSS.V005.I08, 10.18637/jss.v005.i08]
[7]  
Anzt H., 2020, Journal of Open Source Software, V5, P2260, DOI DOI 10.21105/JOSS.02260
[8]   Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose [J].
Arrigoni, Viviana ;
Maggioli, Filippo ;
Massini, Annalisa ;
Rodola, Emanuele .
50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
[9]  
Avron H., 2010, Workshop on Large-scale Data Mining: Theory and Applications, V10, P10
[10]  
Avron H, 2015, PR MACH LEARN RES, V37, P1795