Learning (k,l)-context-sensitive probabilistic grammars with nonparametric Bayesian approach

被引:0
作者
Shibata, Chihiro [1 ]
机构
[1] Hosei Univ, Dept Adv Sci, Fac Sci & Engn, Tokyo, Japan
关键词
Hierarchical Pitman-Yor processes; Distributional learning; Gibbs sampling;
D O I
10.1007/s10994-021-06034-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inferring formal grammars with nonparametric Bayesian approach is one of the most powerful approach for achieving high accuracy from unsupervised data. In this paper, mildlycontext-sensitive probabilities, called (k, l)-context-sensitive probabilities, are defined on context-free grammars (CFGs). Inferring CFGs where the probabilities of rules are identified from contexts can be seen as a kind of dual approaches for distributional learning, in which the contexts characterize the substrings. We can handle the data sparsity for the context-sensitive probabilities by the smoothing effect of the hierarchical nonparametric Bayesian models such as Pitman-Yor processes (PYPs). We define the hierarchy of PYPs naturally by augmenting the infinite PCFGs. The blocked Gibbs sampling is known to be effective for inferring PCFGs. We show that, by modifying the inside probabilities, the blocked Gibbs sampling is able to be applied to the (k, l)-context-sensitive probabilistic grammars. At the same time, we show that the time complexity for (k, l)-context-sensitive probabilities of a CFG is O(vertical bar V vertical bar(l+3) vertical bar w vertical bar(3)) for each sentence w, where V is a set of nonterminals. Since it is computationally too expensive to iterate sufficient times especially when vertical bar V vertical bar is not small, some alternative sampling algorithms are required. Therefore, we propose a new sampling method called composite sampling, with which the sampling procedure is separated into sub-procedures for nonterminals and for derivation trees. Finally, we demonstrate that the inferred (k, 0)-context-sensitive probabilistic grammars can achieve lower perplexities than other probabilistic language models such as PCFGs, n-grams, and HMMs.
引用
收藏
页码:3267 / 3301
页数:35
相关论文
共 35 条
[1]   LEARNING REGULAR SETS FROM QUERIES AND COUNTEREXAMPLES [J].
ANGLUIN, D .
INFORMATION AND COMPUTATION, 1987, 75 (02) :87-106
[2]  
[Anonymous], 2012, P 50 ANN M ASS COMP
[3]  
BLACK E, 1993, 31ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P31
[4]  
Blunsom P., 2011, P 49 ANN M ASS COMPU, V1, P865
[5]  
Blunsom Phil., 2010, Proceedings of the Human Language Technology: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, P238
[6]  
Clark A, 2003, EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P59
[7]  
Clark A, 2006, LECT NOTES ARTIF INT, V4201, P59
[8]  
Cohn T, 2010, J MACH LEARN RES, V11, P3053
[9]  
Coste F., 2012, JMLR WORKSHOP C P, V21, P97
[10]  
DasGupta A, 2010, SPRINGER TEXTS STAT, P1