Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level

被引:11
作者
Dang Van Thin [1 ]
Ngan Luu-Thuy Nguyen [1 ]
Tri Minh Truong [1 ]
Lac Si Le [1 ]
Duy Tin Vo [2 ,3 ]
机构
[1] Vietnam Natl Univ, Univ Informat Technol, Ho Chi Minh City, Vietnam
[2] Lakehead Univ, Dept Comp Sci, Thunder Bay, ON P7B 5E1, Canada
[3] VinAI Res, Hanoi, Vietnam
关键词
Aspect-based sentiment analysis; deep neural network; multi-task learning; Vietnamese corpora;
D O I
10.1145/3446678
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site.(1)
引用
收藏
页数:22
相关论文
共 51 条
[21]  
Li Xin, 2019, P 5 WORKSHOP NOISY U, P34, DOI DOI 10.48550/ARXIV.1910.00883
[22]  
Liu B., 2012, MINING TEXT DATA, P415, DOI [DOI 10.1007/978-1-4614-3223-4_13, 10.1007/978-1-4614-3223-413]
[23]   Aspect-Based Sentiment Analysis of Vietnamese Texts with Deep Learning [J].
Long Mai ;
Bac Le .
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 :149-158
[24]  
Mikolov T., 2013, 1 INT C LEARN REPR I, DOI DOI 10.48550/ARXIV.1301.3781
[25]   NOMA User Pairing and UAV Placement in UAV-Based Wireless Networks [J].
Minh Tri Nguyen ;
Le, Long Bao .
ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
[26]  
Nguyen, 2020, FINDINGS EMNLP 2020
[27]  
Nguyen Huyen T M, 2019, Journal of Computer Science and Cybernetics, V34, P295
[28]  
Nguyen M. N., 2020, International Journal of Advanced Research in Education and Society, V2, P23
[29]  
Thuy NTT, 2018, INT CONF KNOWL SYS, P67, DOI 10.1109/KSE.2018.8573395
[30]   Consonants, vowels and tones across Vietnamese dialects [J].
Pham, Ben ;
McLeod, Sharynne .
INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2016, 18 (02) :122-134