Binarization of Synchronous Context-Free Grammars

被引:12
作者
Huang, Liang [1 ]
Zhang, Hao
Gildea, Daniel [2 ]
Knight, Kevin [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
关键词
D O I
10.1162/coli.2009.35.4.35406
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages. We develop a theory of binarization for synchronous context-free grammars and present a linear-time algorithm for binarizing synchronous rules when possible. In our large-scale experiments, we found that almost all rules are binarizable and the resulting binarized rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system. We also discuss the more general, and computationally more difficult, problem of finding good parsing strategies for non-binarizable rules, and present an approximate polynomial-time algorithm for this problem.
引用
收藏
页码:559 / 595
页数:37
相关论文
共 33 条
  • [1] Aho A. V., 1972, The Theory of Parsing, Translation, and Compiling
  • [2] [Anonymous], P 44 ACL SYDN AUSTR
  • [3] [Anonymous], 1970, Math. Syst. Theory, DOI DOI 10.1007/BF01695769
  • [4] [Anonymous], P 43 ANN M ASS COMP
  • [5] [Anonymous], 1990, Introduction to Algorithms
  • [6] [Anonymous], P ACL 2003
  • [7] [Anonymous], P HLT NAACL 2004
  • [8] COMPLEXITY OF FINDING EMBEDDINGS IN A K-TREE
    ARNBORG, S
    CORNEIL, DG
    PROSKUROWSKI, A
    [J]. SIAM JOURNAL ON ALGEBRAIC AND DISCRETE METHODS, 1987, 8 (02): : 277 - 284
  • [9] Treewidth for graphs with small chordality
    Bodlaender, HL
    Thilikos, DM
    [J]. DISCRETE APPLIED MATHEMATICS, 1997, 79 (1-3) : 45 - 61
  • [10] Catalan E., 1844, J. Reine Angew. Math., V27, P192