On the Role of Sparsity and DAG Constraints for Learning Linear DAGs

被引：0

作者：

Ng, Ignavier ^{[1
]}

Ghassami, AmirEmad ^{[2
]}

Zhang, Kun ^{[3
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

[2] Johns Hopkins Univ, Baltimore, MD USA

[3] Carnegie Mellon Univ, Pittsburgh, PA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

BAYESIAN NETWORKS; SELECTION; ALGORITHMS; GRAPHS; MODEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning graphical structures based on Directed Acyclic Graphs (DAGs) is a challenging problem, partly owing to the large search space of possible graphs. A recent line of work formulates the structure learning problem as a continuous constrained optimization task using the least squares objective and an algebraic characterization of DAGs. However, the formulation requires a hard DAG constraint and may lead to optimization difficulties. In this paper, we study the asymptotic role of the sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases, and investigate their usefulness in the finite sample regime. Based on the theoretical results, we formulate a likelihood-based score function, and show that one only has to apply soft sparsity and DAG constraints to learn a DAG equivalent to the ground truth DAG. This leads to an unconstrained optimization problem that is much easier to solve. Using gradient-based optimization and GPU acceleration, our procedure can easily handle thousands of nodes while retaining a high accuracy. Extensive experiments validate the effectiveness of our proposed method and show that the DAG-penalized likelihood objective is indeed favorable over the least squares one with the hard DAG constraint.

引用

页数：12

共 56 条

[1] Abadi Martin, 2016, arXiv
[2] A NEW SCALING AND SQUARING ALGORITHM FOR THE MATRIX EXPONENTIAL
Al-Mohy, Awad H.
Higham, Nicholas J.
[J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2009, 31 (03) : 970 - 989
[3] Aragam B, 2015, J MACH LEARN RES, V16, P2273
[4] Emergence of scaling in random networks
Barabási, AL
Albert, R
[J]. SCIENCE, 1999, 286 (5439) : 509 - 512
[5] Bertsekas Dimitri P., 1999, Nonlinear programming, Vsecond
[6] Numerical comparison of Augmented Lagrangian algorithms for nonconvex problems
Birgin, EG
Castillo, RA
Martínez, JM
[J]. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2005, 31 (01) : 31 - 55
[7] The boundedness of penalty parameters in an augmented Lagrangian method with constrained subproblems
Birgin, Ernesto G.
Fernandez, Damian
Martinez, J. M.
[J]. OPTIMIZATION METHODS & SOFTWARE, 2012, 27 (06) : 1001 - 1024
[8] Byrd R., 2003, SIAM J SCI COMPUTING, V16
[9] Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
[10] Chickering D. M., 1996, Learning from Data: Artificial Intelligence and Statistics V

← 1 2 3 4 5 6 →