Improving automatic Chinese-Japanese patent translation using bilingual term extraction

被引:1
|
作者
Yang, Wei [1 ]
Lepage, Yves [1 ]
机构
[1] Waseda Univ, Grad Sch IPS, 2-7 Hibikino, Kitakyushu, Fukuoka 8080135, Japan
关键词
term extraction; monolingual term; bilingual term; alignment; statistical machine translation;
D O I
10.1002/tee.22505
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The identification of terms in scientific and patent documents is a crucial issue for applications like information retrieval, text categorization, and also for machine translation. This paper describes a method to improve Chinese-Japanese statistical machine translation of patents by re-tokenizing the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. An automatic alignment method is used to identify corresponding terms. The most promising bilingual multi-word terms are extracted by setting some threshold on translation probabilities and further filtering by considering the components of the bilingual multi-word terms in characters as well as the ratio of their lengths in words. We also use kanji (Japanese)-hanzi (Chinese) character conversion to confirm and extract more promising bilingual multi-word terms. We obtain a high quality of correspondence with 93% in bilingual term extraction and a significant improvement of 1.5 BLEU score in a translation experiment. (c) 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
引用
收藏
页码:117 / 125
页数:9
相关论文
共 50 条
  • [1] Extraction of bilingual technical terms for chinese-japanese patent translation
    Yang, Wei
    Yan, Jinghui
    Lepage, Yves
    HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop, 2016, : 81 - 87
  • [2] Learning Chinese-Japanese Bilingual Word Embedding by Using Common Characters
    Wang, Jilei
    Luo, Shiying
    Li, Yanning
    Xia, Shu-Tao
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2016, 2016, 9983 : 82 - 93
  • [3] Bilingual Representation of Speech Sounds in Chinese-Japanese Bilinguals
    Lu, Sa
    Wang, Kun
    Wang, Xin
    Tang, Xiaoyu
    Ren, Yanna
    Wu, Jinglong
    2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2016, : 2593 - 2598
  • [4] Chinese-Japanese machine translation exploiting Chinese characters
    Chu, C. (chu@nlp.ist.i.kyoto-u.ac.jp), 2013, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (12):
  • [5] Treatment of quantifiers in Chinese-Japanese machine translation
    Yin, Dapeng
    Shao, Min
    Jiang, Peilin
    Ren, Fuji
    Kuroiwa, Shingo
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 930 - 935
  • [6] Automatic Extraction of Chinese/Japanese Translation Patterns Using Prefix Span
    Qian, Wang
    Komiya, Kanako
    Kotani, Yoshiyuki
    2011 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2011), 2011, : 139 - 144
  • [7] An Overview of Research Progress on Chinese-Japanese Machine Translation
    Mi, Liying
    Luo, Xin
    2011 INTERNATIONAL CONFERENCE ON AEROSPACE ENGINEERING AND INFORMATION TECHNOLOGY (AEIT 2011), 2011, : 305 - 311
  • [8] Rule-based translation of quantifiers for Chinese-Japanese machine translation
    Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minamijosanjima-cho, Tokushima 770-8506, Japan
    不详
    WSEAS Trans. Comput., 2006, 9 (2031-2036):
  • [9] Experimental Study From Chinese-Japanese And Japanese-Chinese Bilingual Behavioral Data To Brain Computer Interface
    Li, Xiujun
    Yang, Jingjing
    Tong, Dan
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 1733 - 1737
  • [10] Chinese-Japanese translation of causative sentences using super-function based machine translation system
    Mi, Liying
    Luo, Xin
    Ren, Fuji
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (04): : 915 - 925