Text Generation for Imbalanced Text Classification

被引:0
作者
Akkaradamrongrat, Suphamongkol [1 ]
Kachamas, Pornpimon [2 ]
Sinthupinyo, Sukree [1 ]
机构
[1] Chulalongkorn Univ, Dept Comp Engn, Bangkok, Thailand
[2] Chulalongkorn Univ, Grad Sch, Technopreneurship & Innovat Management, Bangkok, Thailand
来源
2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019) | 2019年
关键词
imbalanced text classification; text generation; Markov chains; LSTM;
D O I
10.1109/jcsse.2019.8864181
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The problem of imbalanced data can be frequently found in the real-world data. It leads to the bias of classification models, that is, the models predict most samples as major classes which are often the negative class. In this research, text generation techniques were used to generate synthetic minority class samples to make the text dataset balanced. Two text generation methods: the text generation using Markov Chains and the text generation using Long Short-term Memory (LSTM) networks were applied and compared in the term of ability to improve the performance of imbalanced text classification. Our experimental study is based on LSTM networks classifier. Traditional over-sampling technique was also used as baseline. The study investigated our Thai-language advertisement text dataset from Facebook. According to the increase of recall value, applying of these techniques showed the improvement of an ability to create model predicting more positive samples, which are minority samples. It can be found that the Markov Chains technique outperformed traditional over-sampling and text generation using LSTM in majority of the models.
引用
收藏
页码:181 / 186
页数:6
相关论文
共 16 条
[1]  
Abraham A., 2013, J NETW INNOV COMPUT, V1, P332, DOI DOI 10.20943/01201706.4351
[2]  
Amaly L., 2012, J. Bus. Manag, V1, P352
[3]   Markov Constraints for Generating Lyrics with Style [J].
Barbieri, Gabriele ;
Pachet, Francois ;
Roy, Pierre ;
Esposti, Mirko Degli .
20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 :115-+
[4]  
BILLINGSLEY P, 1961, ANN MATH STAT, V32, P12, DOI 10.1214/aoms/1177705136
[5]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[6]  
Ibrahim M., 2018, 17 IEEE INT C MACH L
[7]  
McKeown Kathleen R., 1992, TEXT GENERATION
[8]  
Nie W., 2018, 7 INT C LEARN REPR
[9]  
Potash Peter, 2015, P 2015 C EMPIRICAL M, P1919
[10]  
Sarakit P, 2015, 2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA