On the Role of Sparsity and DAG Constraints for Learning Linear DAGs

被引:0
作者
Ng, Ignavier [1 ]
Ghassami, AmirEmad [2 ]
Zhang, Kun [3 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Johns Hopkins Univ, Baltimore, MD USA
[3] Carnegie Mellon Univ, Pittsburgh, PA USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷
关键词
BAYESIAN NETWORKS; SELECTION; ALGORITHMS; GRAPHS; MODEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning graphical structures based on Directed Acyclic Graphs (DAGs) is a challenging problem, partly owing to the large search space of possible graphs. A recent line of work formulates the structure learning problem as a continuous constrained optimization task using the least squares objective and an algebraic characterization of DAGs. However, the formulation requires a hard DAG constraint and may lead to optimization difficulties. In this paper, we study the asymptotic role of the sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases, and investigate their usefulness in the finite sample regime. Based on the theoretical results, we formulate a likelihood-based score function, and show that one only has to apply soft sparsity and DAG constraints to learn a DAG equivalent to the ground truth DAG. This leads to an unconstrained optimization problem that is much easier to solve. Using gradient-based optimization and GPU acceleration, our procedure can easily handle thousands of nodes while retaining a high accuracy. Extensive experiments validate the effectiveness of our proposed method and show that the DAG-penalized likelihood objective is indeed favorable over the least squares one with the hard DAG constraint.
引用
收藏
页数:12
相关论文
共 56 条
  • [1] Abadi Martin, 2016, arXiv
  • [2] A NEW SCALING AND SQUARING ALGORITHM FOR THE MATRIX EXPONENTIAL
    Al-Mohy, Awad H.
    Higham, Nicholas J.
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2009, 31 (03) : 970 - 989
  • [3] Aragam B, 2015, J MACH LEARN RES, V16, P2273
  • [4] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [5] Bertsekas Dimitri P., 1999, Nonlinear programming, Vsecond
  • [6] Numerical comparison of Augmented Lagrangian algorithms for nonconvex problems
    Birgin, EG
    Castillo, RA
    Martínez, JM
    [J]. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2005, 31 (01) : 31 - 55
  • [7] The boundedness of penalty parameters in an augmented Lagrangian method with constrained subproblems
    Birgin, Ernesto G.
    Fernandez, Damian
    Martinez, J. M.
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2012, 27 (06) : 1001 - 1024
  • [8] Byrd R., 2003, SIAM J SCI COMPUTING, V16
  • [9] Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
  • [10] Chickering D. M., 1996, Learning from Data: Artificial Intelligence and Statistics V