Block-enhanced precision matrix estimation for large-scale datasets

被引:5
|
作者
Eftekhari, Aryan [1 ]
Pasadakis, Dimosthenis [1 ]
Bollhoefer, Matthias [2 ]
Scheidegger, Simon [3 ]
Schenk, Olaf [1 ]
机构
[1] Univ Svizzera Italiana, Fac Informat, Inst Comp, Lugano, Switzerland
[2] TU Braunschweig, Inst Numer Anal, Braunschweig, Germany
[3] Univ Lausanne, Dept Econ, Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
Covariance matrices; Graphical model; Optimization; Gaussian Markov random field; Machine learning application; SPARSE; SELECTION; PARALLEL; SOLVER; MODEL;
D O I
10.1016/j.jocs.2021.101389
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The l(1)-regularized Gaussian maximum likelihood method is a common approach for sparse precision matrix estimation, but one that poses a computational challenge for high-dimensional datasets. We present a novel l(1)-regularized maximum likelihood method for performant large-scale sparse precision matrix estimation utilizing the block structures in the underlying computations. We identify the computational bottlenecks and contribute a block coordinate descent update as well as a block approximate matrix inversion routine, which is then parallelized using a shared-memory scheme. We demonstrate the effectiveness, accuracy, and performance of these algorithms. Our numerical examples and comparative results with various modern open-source packages reveal that these precision matrix estimation methods can accelerate the computation of covariance matrices by two to three orders of magnitude, while keeping memory requirements modest. Furthermore, we conduct large-scale case studies for applications from finance and medicine with several thousand random variables to demonstrate applicability for real-world datasets.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Block Krylov subspace methods for large-scale matrix computations in control
    Sheikhani, A. H. Refahi
    Kordrostami, S.
    JOURNAL OF TAIBAH UNIVERSITY FOR SCIENCE, 2015, 9 (01): : 116 - 120
  • [22] Online graph regularized non-negative matrix factorization for large-scale datasets
    Liu, Fudong
    Yang, Xuejun
    Guan, Naiyang
    Yi, Xiaodong
    NEUROCOMPUTING, 2016, 204 : 162 - 171
  • [23] MedDialog: Large-scale Medical Dialogue Datasets
    Zeng, Guangtao
    Yang, Wenmian
    Ju, Zeqian
    Yang, Yue
    Wang, Sicheng
    Zhang, Ruisi
    Zhou, Meng
    Zeng, Jiaqi
    Dong, Xiangyu
    Zhang, Ruoyu
    Fang, Hongchao
    Zhu, Penghui
    Chen, Shu
    Xie, Pengtao
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9241 - 9250
  • [24] Towards algorithmic analytics for large-scale datasets
    Bzdok, Danilo
    Nichols, Thomas E.
    Smith, Stephen M.
    NATURE MACHINE INTELLIGENCE, 2019, 1 (07) : 296 - 306
  • [25] RANSAC-SVM for Large-Scale Datasets
    Nishida, Kenji
    Kurita, Takio
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3767 - 3770
  • [26] Map Matching Algorithm for Large-scale Datasets
    Fiedler, David
    Cap, Michal
    Nykl, Jan
    Zilecky, Pavol
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 500 - 508
  • [27] Momentum Online LDA for Large-scale Datasets
    Ouyang, Jihong
    Lu, You
    Li, Ximing
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1075 - 1076
  • [28] Large-Scale Datasets in Special Education Research
    Griffin, Megan M.
    Steinbrecher, Trisha D.
    USING SECONDARY DATASETS TO UNDERSTAND PERSONS WITH DEVELOPMENTAL DISABILITIES AND THEIR FAMILIES, 2013, 45 : 155 - 183
  • [29] Towards algorithmic analytics for large-scale datasets
    Danilo Bzdok
    Thomas E. Nichols
    Stephen M. Smith
    Nature Machine Intelligence, 2019, 1 : 296 - 306
  • [30] Iterative Classification for Sanitizing Large-Scale Datasets
    Li, Bo
    Vorobeychik, Yevgeniy
    Li, Muqun
    Malin, Bradley
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 841 - 846