Block-enhanced precision matrix estimation for large-scale datasets

被引:5
作者
Eftekhari, Aryan [1 ]
Pasadakis, Dimosthenis [1 ]
Bollhoefer, Matthias [2 ]
Scheidegger, Simon [3 ]
Schenk, Olaf [1 ]
机构
[1] Univ Svizzera Italiana, Fac Informat, Inst Comp, Lugano, Switzerland
[2] TU Braunschweig, Inst Numer Anal, Braunschweig, Germany
[3] Univ Lausanne, Dept Econ, Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
Covariance matrices; Graphical model; Optimization; Gaussian Markov random field; Machine learning application; SPARSE; SELECTION; PARALLEL; SOLVER; MODEL;
D O I
10.1016/j.jocs.2021.101389
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The l(1)-regularized Gaussian maximum likelihood method is a common approach for sparse precision matrix estimation, but one that poses a computational challenge for high-dimensional datasets. We present a novel l(1)-regularized maximum likelihood method for performant large-scale sparse precision matrix estimation utilizing the block structures in the underlying computations. We identify the computational bottlenecks and contribute a block coordinate descent update as well as a block approximate matrix inversion routine, which is then parallelized using a shared-memory scheme. We demonstrate the effectiveness, accuracy, and performance of these algorithms. Our numerical examples and comparative results with various modern open-source packages reveal that these precision matrix estimation methods can accelerate the computation of covariance matrices by two to three orders of magnitude, while keeping memory requirements modest. Furthermore, we conduct large-scale case studies for applications from finance and medicine with several thousand random variables to demonstrate applicability for real-world datasets.
引用
收藏
页数:13
相关论文
共 50 条
[21]   State estimation for large-scale wastewater treatment plants [J].
Busch, Jan ;
Elixmann, David ;
Kuehl, Peter ;
Gerkens, Carine ;
Schloeder, Johannes P. ;
Bock, Hans G. ;
Marquardt, Wolfgang .
WATER RESEARCH, 2013, 47 (13) :4774-4787
[22]   Optimal Estimation of the Null Distribution in Large-Scale Inference [J].
Kotekal, Subhodh ;
Gao, Chao .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (03) :2075-2103
[23]   Evolutionary Large-Scale Multiobjective Optimization for Ratio Error Estimation of Voltage Transformers [J].
He, Cheng ;
Cheng, Ran ;
Zhang, Chuanji ;
Tian, Ye ;
Chen, Qin ;
Yao, Xin .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (05) :868-881
[24]   AN EFFICIENT PROXIMAL BLOCK COORDINATE HOMOTOPY METHOD FOR LARGE-SCALE SPARSE LEAST SQUARES PROBLEMS [J].
Wang, Guoqiang ;
Wei, Xinyuan ;
Yu, Bo ;
Xu, Lijun .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (01) :A395-A423
[25]   Network Cost Estimation for Mini-Grids in Large-Scale Rural Electrification Planning [J].
Ciller, Pedro ;
Lumbreras, Sara ;
Gonzalez-Garcia, Andres .
ENERGIES, 2021, 14 (21)
[26]   Large-scale automatic block adjustment from satellite to indoor photogrammetry [J].
Li, Deren ;
Yang, Bo ;
Wang, Mi ;
Wang, Taiping ;
Gao, Yunlong ;
Pi, Yingdong .
GEO-SPATIAL INFORMATION SCIENCE, 2023, 26 (02) :160-174
[27]   Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets [J].
Martinez-Ballesteros, Maria ;
Bacardit, Jaume ;
Troncoso, Alicia ;
Riquelme, Jose C. .
INTEGRATED COMPUTER-AIDED ENGINEERING, 2015, 22 (01) :21-39
[28]   Quantifying the reliability of precipitation datasets for monitoring large-scale East Asian precipitation variations [J].
Sohn, Soo-Jin ;
Tam, Chi-Yung ;
Ashok, Karumuri ;
Ahn, Joong-Bae .
INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2012, 32 (10) :1520-1526
[29]   A parallel approximate SS-ELM algorithm based on MapReduce for large-scale datasets [J].
Chen, Cen ;
Li, Kenli ;
Ouyang, Aijia ;
Li, Keqin .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 108 :85-94
[30]   The role of spatial dependence for large-scale flood risk estimation [J].
Metin, Ayse Duha ;
Nguyen Viet Dung ;
Schroeter, Kai ;
Vorogushyn, Sergiy ;
Guse, Bjoern ;
Kreibich, Heidi ;
Merz, Bruno .
NATURAL HAZARDS AND EARTH SYSTEM SCIENCES, 2020, 20 (04) :967-979