Probabilistic grammars for equation discovery

被引:16
作者
Brence, Jure [1 ,2 ]
Todorovski, Ljupco [1 ,3 ]
Dzeroski, Sago [1 ,2 ]
机构
[1] Jozef Stefan Inst, Dept Knowledge Technol, Jamova Cesta 39, Ljubljana 1000, Slovenia
[2] Jozef Stefan Int Postgrad Sch, Jamova Cesta 39, Ljubljana 1000, Slovenia
[3] Univ Ljubljana, Fac Publ Adm, Gosarjeva Ul 5, Ljubljana 1000, Slovenia
关键词
Equation discovery; Symbolic regression; Automated modeling; Grammar; Probabilistic context-free grammar; Monte-Carlo; GLOBAL OPTIMIZATION;
D O I
10.1016/j.knosys.2021.107077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Equation discovery, also known as symbolic regression, is a type of automated modeling that discovers scientific laws, expressed in the form of equations, from observed data and expert knowledge. Deterministic grammars, such as context-free grammars, have been used to limit the search spaces in equation discovery by providing hard constraints that specify which equations to consider and which not. In this paper, we propose the use of probabilistic context-free grammars in equation discovery. Such grammars encode soft constraints, specifying a prior probability distribution on the space of possible equations. We show that probabilistic grammars can be used to elegantly and flexibly formulate the parsimony principle, that favors simpler equations, through probabilities attached to the rules in the grammars. We demonstrate that the use of probabilistic, rather than deterministic grammars, in the context of a Monte-Carlo algorithm for grammar-based equation discovery, leads to more efficient equation discovery. Finally, by specifying prior probability distributions over equation spaces, the foundations are laid for Bayesian approaches to equation discovery. (C) 2021 The Author(s). Published by Elsevier B.V.
引用
收藏
页数:12
相关论文
共 30 条
[1]  
[Anonymous], Feynman lectures on physics
[2]  
[Anonymous], 2009, Natural language processing with Python: analyzing text with the natural language toolkit
[3]   Reasoning about nonlinear system identification [J].
Bradley, E ;
Easley, M ;
Stolle, R .
ARTIFICIAL INTELLIGENCE, 2001, 133 (1-2) :139-188
[4]   Inductive process modeling [J].
Bridewell, Will ;
Langley, Pat ;
Todorovski, Ljupco ;
Dzeroski, Saso .
MACHINE LEARNING, 2008, 71 (01) :1-32
[5]   Two Kinds of Knowledge in Scientific Discovery [J].
Bridewell, Will ;
Langley, Pat .
TOPICS IN COGNITIVE SCIENCE, 2010, 2 (01) :36-52
[6]   Discovering governing equations from data by sparse identification of nonlinear dynamical systems [J].
Brunton, Steven L. ;
Proctor, Joshua L. ;
Kutz, J. Nathan .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (15) :3932-3937
[7]  
Chi ZY, 1999, COMPUT LINGUIST, V25, P131
[8]   A bibliographical study of grammatical inference [J].
de la Higuera, C .
PATTERN RECOGNITION, 2005, 38 (09) :1332-1348
[9]  
Dzeroski S., 2007, LNAI, V4660
[10]  
Geman S., 2002, INT ENCY SOCIAL BEHA, V2002, P12075