Improving automatic Chinese-Japanese patent translation using bilingual term extraction

被引:1
|
作者
Yang, Wei [1 ]
Lepage, Yves [1 ]
机构
[1] Waseda Univ, Grad Sch IPS, 2-7 Hibikino, Kitakyushu, Fukuoka 8080135, Japan
关键词
term extraction; monolingual term; bilingual term; alignment; statistical machine translation;
D O I
10.1002/tee.22505
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The identification of terms in scientific and patent documents is a crucial issue for applications like information retrieval, text categorization, and also for machine translation. This paper describes a method to improve Chinese-Japanese statistical machine translation of patents by re-tokenizing the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. An automatic alignment method is used to identify corresponding terms. The most promising bilingual multi-word terms are extracted by setting some threshold on translation probabilities and further filtering by considering the components of the bilingual multi-word terms in characters as well as the ratio of their lengths in words. We also use kanji (Japanese)-hanzi (Chinese) character conversion to confirm and extract more promising bilingual multi-word terms. We obtain a high quality of correspondence with 93% in bilingual term extraction and a significant improvement of 1.5 BLEU score in a translation experiment. (c) 2017 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
引用
收藏
页码:117 / 125
页数:9
相关论文
共 50 条
  • [41] Improving automatic call classification using machine translation
    Faruquie, Tanveer A.
    Rajput, Nitendra
    Raj, Vimal
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 129 - +
  • [42] The automatic extraction of translation patterns and matching algorithm in an English-Chinese machine translation system
    Li, J
    Wang, B
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 839 - 843
  • [43] The Chinese Unknown Term Translation Mining with Supervised Candidate Term Extraction Strategy
    Liang, Ying-Hong
    Li, Jin-xiang
    Ye, Liang
    Chen, Ke
    Guo, Cui-zhen
    CEIS 2011, 2011, 15
  • [44] Research on Feature-based Word Automatic Translation Technology in Japanese-Chinese Translation System
    Liu, Jing
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 683 - 687
  • [45] A Chinese-Japanese Parallel Corpus for Neural Machine Translation Based on Web-Crawled Data from NetEase Cloud Music
    Li, Haowei
    Zhang, Jinyi
    Tian, Ye
    Matsumoto, Tadahiro
    Proceedings of 2024 2nd International Conference on Signal Processing and Intelligent Computing, SPIC 2024, 2024, : 980 - 984
  • [46] Automatic Chinese term extraction based on decomposition of prime string
    School of Software, Tsinghua University, Beijing 100084, China
    不详
    不详
    Jisuanji Gongcheng, 2006, 23 (188-190):
  • [47] Research on Improving the Quality of Japanese Chinese Machine Translation Based on Deep Learning
    Lu, Jialing
    Yin, Fengxian
    Frontiers in Artificial Intelligence and Applications, 383 : 267 - 277
  • [48] English-Chinese Translation for Patent Titles Using Statistical Models
    Cai, Dongfeng
    Lin, Xiaoqing
    Ji, Duo
    Zhang, Guiping
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2009, 12 (02): : 419 - 427
  • [49] Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning
    Dong, Zhaofeng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [50] Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning
    Dong, Zhaofeng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022