On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration

被引:0
作者
Thai-Hoang Pham [1 ]
Xuan-Khoai Pham [2 ]
Phuong Le-Hong [3 ]
机构
[1] Alt Inc, Hanoi, Vietnam
[2] FPT Univ, Hanoi, Vietnam
[3] Vietnam Natl Univ, Hanoi, Vietnam
来源
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP) | 2017年
关键词
machine translation; Vietnamese; diacritic restoration; phrase-based; neural-based; Moses; OpenNMT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-the-art method for this problem. On a large dataset, the phrase-based approach has an accuracy of 97.32% while that of the neural-based approach is 96.15%. While the neural-based method has a slightly lower accuracy, it is about twice faster than the phrase-based method in terms of inference speed. Moreover, neural-based machine translation method has much room for future improvement such as incorporating pre-trained word embeddings and collecting more training data.
引用
收藏
页码:272 / 275
页数:4
相关论文
共 12 条
[1]  
[Anonymous], 2013, P 27 PACIFIC ASIA C
[2]  
Cho Kyunghyun., EMNLP 2014
[3]  
Koehn P., 2007, ACL
[4]  
Koehn P., 2003, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume, P48
[5]  
Luong M.T., 2015, P 2015 C EMP METH NA
[6]  
Nguyen MV, 2012, IEEE C EVOL COMPUTAT
[7]  
Nghia H.T., 2009, Computing and Communication Technologies, P1
[8]  
Nguyen K.-H., 2010, DIACRITIC RESTORATIO, P631
[9]   BLEU: a method for automatic evaluation of machine translation [J].
Papineni, K ;
Roukos, S ;
Ward, T ;
Zhu, WJ .
40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, :311-318
[10]   Machine Translation Approach for Vietnamese Diacritic Restoration [J].
Thi Ngoc Diep Do ;
Duy Binh Nguyen ;
Dang Khoa Mac ;
Do Dat Tran .
2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, :103-106