Linguistic-Relationships-Based Approach for Improving Word Alignment

被引:6
|
作者
Phuoc Tran [1 ]
Dien Dinh [2 ]
Tan Le [3 ]
Nguyen, Long H. B. [2 ]
机构
[1] Ton Duc Thang Univ, Fac Informat Technol, NLP KD Lab, Ho Chi Minh City, Vietnam
[2] VNU Univ Sci, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Univ Quebec, Fac Informat Technol, Montreal, PQ, Canada
关键词
Word alignment; linguistic relationships; Chinese-Vietnamese machine translation; Sino-Vietnamese; content word;
D O I
10.1145/3133323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The unsupervised word alignments (such as GIZA++) are widely used in the phrase-based statistical machine translation. The quality of the model is proportional to the size and the quality of the bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, a result of unsupervised word alignment sometimes is of low quality due to the sparse data. In addition, this model does not take advantage of the linguistic relationships to improve performance of word alignment. Chinese and Vietnamese have the same language type and have close linguistic relationships. In this article, we integrate the characteristics of linguistic relationships into the word alignment model to enhance the quality of Chinese-Vietnamese word alignment. These linguistic relationships are Sino-Vietnamese and content word. The experimental results showed that our method improved the performance of word alignment as well as the quality of machine translation.
引用
收藏
页数:16
相关论文
共 26 条
  • [1] Chinese-Vietnamese Word Alignment Method Based on Bidirectional RNN and Linguistic Features
    Gao, Shengxiang
    Zhu, Haodong
    Wang, Zhuo
    Yu, Zhengtao
    Wang, Xiaohan
    COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2018, 2019, 917 : 454 - 465
  • [2] Improving Word Alignment Through Morphological Analysis
    Vuong Van Bui
    Thanh Trung Tran
    Nhat Bich Thi Nguyen
    Tai Dinh Pham
    Anh Ngoc Le
    Cuong Anh Le
    INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2015, 2015, 9376 : 315 - 325
  • [3] Incorporating Linguistic Information to Statistical Word-Level Alignment
    Cendejas, Eduardo
    Barcelo, Grettel
    Gelbukh, Alexander
    Sidorov, Grigori
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 387 - 394
  • [4] A Word Segmentation Method of Ancient Chinese Based on Word Alignment
    Che, Chao
    Zhao, Hanyu
    Wu, Xiaoting
    Zhou, Dongsheng
    Zhang, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 761 - 772
  • [5] A Hybrid Approach for Word Alignment with Statistical Modeling and Chunker
    Srivastava, Jyoti
    Sanyal, Sudip
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 570 - 581
  • [6] A Simple Approach to Use Bilingual Information Sources for Word Alignment
    Espla-Gomis, Miguel
    Sanchez-Martinez, Felipe
    Forcada, Mikel L.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (49): : 93 - 99
  • [7] Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling
    Mermer, Coskun
    Saraclar, Murat
    Sarikaya, Ruhi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (05): : 1090 - 1101
  • [8] Investigating English-Chinese Word Level Alignment by Using Semantic Similarities and Linguistic Knowledge
    Huang, Fuwei
    2015 5TH INTERNATIONAL CONFERENCE ON APPLIED SOCIAL SCIENCE (ICASS 2015), PT 2, 2015, 81 : 212 - 216
  • [9] POS-based Word Alignment for Small Corpus
    Srivastava, Jyoti
    Sanyal, Sudip
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 37 - 40
  • [10] A word alignment model based on multiobjective evolutionary algorithms
    Chen, Yidong
    Shi, Xiaodong
    Zhou, Changle
    Hong, Qingyang
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (11-12) : 1724 - 1729