Linguistic-Relationships-Based Approach for Improving Word Alignment

被引:6
作者
Phuoc Tran [1 ]
Dien Dinh [2 ]
Tan Le [3 ]
Nguyen, Long H. B. [2 ]
机构
[1] Ton Duc Thang Univ, Fac Informat Technol, NLP KD Lab, Ho Chi Minh City, Vietnam
[2] VNU Univ Sci, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Univ Quebec, Fac Informat Technol, Montreal, PQ, Canada
关键词
Word alignment; linguistic relationships; Chinese-Vietnamese machine translation; Sino-Vietnamese; content word;
D O I
10.1145/3133323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The unsupervised word alignments (such as GIZA++) are widely used in the phrase-based statistical machine translation. The quality of the model is proportional to the size and the quality of the bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, a result of unsupervised word alignment sometimes is of low quality due to the sparse data. In addition, this model does not take advantage of the linguistic relationships to improve performance of word alignment. Chinese and Vietnamese have the same language type and have close linguistic relationships. In this article, we integrate the characteristics of linguistic relationships into the word alignment model to enhance the quality of Chinese-Vietnamese word alignment. These linguistic relationships are Sino-Vietnamese and content word. The experimental results showed that our method improved the performance of word alignment as well as the quality of machine translation.
引用
收藏
页数:16
相关论文
共 26 条
  • [21] A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources
    Srivastava, Jyoti
    Sanyal, Sudip
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 185 - 188
  • [22] A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint
    Mo, Yuanyuan
    Guo, Jianyi
    Yu, Zhengtao
    Luo, Lin
    Gao, Shengxiang
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (04) : 537 - 543
  • [23] A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint
    Yuanyuan Mo
    Jianyi Guo
    Zhengtao Yu
    Lin Luo
    Shengxiang Gao
    International Journal of Machine Learning and Cybernetics, 2015, 6 : 537 - 543
  • [24] Building Vietnamese Dependency Treebank Based on Chinese-Vietnamese Bilingual Word Alignment
    Li, Ying
    Guo, Jianyi
    Yu, Zhengtao
    Wang, Hongbin
    Wen, Yonghua
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1330 - 1335
  • [25] CUDA-based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation
    Jing, Si-Yuan
    Yan, Gao-Rong
    Chen, Xing-Yuan
    Jin, Peng
    Guo, Zhao-Yi
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 189 - 194
  • [26] Naxi-English Bilingual Word Alignment Based on Language Characteristics and Log-Linear Model
    Yu Zhengtao
    Xian Yantuan
    Tian Wei
    Guo Jianyi
    Zhang Tao
    CHINA COMMUNICATIONS, 2012, 9 (03) : 78 - 86