Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation

被引:21
|
作者
Wang, Rui [1 ,2 ]
Zhao, Hai [1 ,2 ]
Lu, Bao-Liang [1 ,2 ]
Utiyama, Masao [3 ]
Sumita, Eiichiro [3 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China
[3] Natl Inst Informat & Commun Technol, Multilingual Translat Lab, Kyoto 6190289, Japan
基金
中国国家自然科学基金;
关键词
Continuous-space language model; language model growing (LMG); neural network language model; statistical machine translation (SMT);
D O I
10.1109/TASLP.2015.2425220
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Larger-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not convenient to obtain larger corpora in the same domain as the bilingual parallel corpora in SMT; 2) most of the previous studies focus on monolingual information from the target corpora only, and redundant-grams have not been fully utilized in SMT. Nowadays, continuous-space language model (CSLM), especially neural network language model (NNLM), has been shown great improvement in the estimation accuracies of the probabilities for predicting the target words. However, most of these CSLM and NNLM approaches still consider monolingual information only or require additional corpus. In this paper, we propose a novel neural network based bilingual LM growing method. Compared to the existing approaches, the proposed method enables us to use bilingual parallel corpus for LM growing in SMT. The results show that our new method outperforms the existing approaches on both SMT performance and computational efficiency significantly.
引用
收藏
页码:1209 / 1220
页数:12
相关论文
共 50 条
  • [1] Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation
    Wang, Rui
    Utiyama, Masao
    Goto, Isao
    Sumita, Eiichiro
    Zhao, Hai
    Lu, Bao-Liang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (03)
  • [2] Generalizing Continuous-space Translation of Paralinguistic Information
    Kano, Takatomo
    Takamichi, Shinnosuke
    Sakti, Sakriani
    Neubig, Graham
    Toda, Tomoki
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2613 - 2617
  • [3] Bilingual Segmenter for Statistical Machine Translation
    Huang, Chung-Chi
    Chen, Wei-Teh
    Chang, Jason S.
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 97 - +
  • [4] Bilingual phrases for statistical machine translation
    Garcia-Varea, I.
    Nevado, F.
    Ortiz, D.
    Tomas, J.
    Casacuberta, F.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 93 - 100
  • [5] Continuous-Space Language Processing: Beyond Word Embeddings
    Ostendorf, Mari
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 3 - 15
  • [6] Factored bilingual n-gram language models for statistical machine translation
    Crego, Josep M.
    Yvon, Francois
    MACHINE TRANSLATION, 2010, 24 (02) : 159 - 175
  • [7] A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
    Zhang, Jingyi
    Utiyama, Masao
    Sumita, Eiichro
    Neubig, Graham
    Nakamura, Satoshi
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1372 - 1381
  • [8] Continuous-space model of computation is Turing universal
    Naughton, TJ
    CRITICAL TECHNOLOGIES FOR THE FUTURE OF COMPUTING, 2000, 4109 : 121 - 128
  • [9] Statistical Machine Translation as a Language Model for Handwriting Recognition
    Devlin, Jacob
    Kamali, Matin
    Subramanian, Krishna
    Prasad, Rohit
    Natarajan, Prem
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 291 - 296
  • [10] Bilingual Sense Similarity for Statistical Machine Translation
    Chen, Boxing
    Foster, George
    Kuhn, Roland
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 834 - 843