AutoBayes: a system for generating data analysis programs from statistical models

被引:49
作者
Fischer, B [1 ]
Schumann, J [1 ]
机构
[1] NASA, RIACS, Moffett Field, CA 94035 USA
关键词
D O I
10.1017/S0956796802004562
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are well-founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and requires substantial knowledge and experience in several areas. In this paper, we describe AUTOBAYEs, a program synthesis system for the generation of data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e. observation or parameter) and its dependencies in the form of a probability distribution. It is a fully declarative problem description, similar in spirit to a set of differential equations. From such a model, AUTOBAYES generates optimized and fully commented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Code is produced by a schema-guided deductive synthesis process. A schema consists of a code template and applicability constraints which are checked against the model during synthesis using theorem proving technology. AUTOBAYEs augments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed form solutions for many problems. It is well-suited for tasks like estimating best-fitting model parameters for the given data. Here, we describe AUTOBAYES'S system architecture, in particular the schema-guided synthesis kernel. Its capabilities are illustrated by a number of advanced textbook examples and benchmarks.
引用
收藏
页码:483 / 508
页数:26
相关论文
共 35 条
[1]  
Berkowitz J, 1979, PHOTOABSORPTION PHOT
[2]  
Biggerstaff T. J., 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002), P613, DOI 10.1109/ICSE.1999.841055
[3]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[4]   Planware - Domain-specific synthesis of high-performance schedulers [J].
Blaine, L ;
Gilham, L ;
Liu, JB ;
Smith, DR ;
Westfold, S .
13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, PROCEEDINGS, 1998, :270-279
[5]  
BLAKE C, 1998, UCI REPOSITROY MACHI
[6]  
BUNTINE WL, 1999, P 5 INT C KNOWL DISC, P372
[7]   Operations for Learning with Graphical Models [J].
Buntine, Wray L. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1994, 2 :159-225
[8]  
*CONTROLSHELL, 1999, CONTR TRI REAL TIM I
[9]  
*CTR ATM SCI, 1999, OZ HOL TOUR
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38