Cross-lingual Short-Text Document Classification for Facebook Comments

被引:32
作者
Faqeeh, Mosab [1 ]
Abdulla, Nawaf [1 ]
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
Quwaider, Muhannad [1 ]
机构
[1] Jordan Univ Sci & Technol, Irbid, Jordan
来源
2014 INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) | 2014年
关键词
document classification; cross-lingual text analysis; social network comments; support vector machine; decision tree; naive Bayes; k-nearest neighbor; IDENTIFICATION;
D O I
10.1109/FiCloud.2014.99
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document Classification (DC) is one of the fundamental problems in text mining. Plenty of works exist on DC with interesting approaches and excellent results; however, most of them focus on a long-text documents written in a single language with English being the most studied language. This work is concerned with the natural step beyond such works which is cross-lingual DC for short-text documents. Specifically, we consider two languages, Arabic and English, and compare the performance of some of the most popular document classifiers on two datasets of short Facebook comments. Apart from limited attempts, the addressed problem has not been studied well enough. The results are encouraging and new insights are obtained.
引用
收藏
页码:573 / 578
页数:6
相关论文
共 45 条
[1]  
Aas K., 1999, RAPORT NR, V941
[2]  
Abbasi A, 2005, LECT NOTES COMPUT SC, V3495, P183
[3]   Applying authorship analysis to extremist-group web forum messages [J].
Abbasi, A ;
Chen, HC .
IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) :67-75
[4]   Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Salem, Arab .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
[5]  
Abdulla N. A., INT J BIG D IN PRESS
[6]  
Aggarwal C. C., 2012, MINING TEXT DATA, P163, DOI [DOI 10.1007/978-1-4614-3223-46, DOI 10.1007/978-1-4614-3223-4, 10.1007/978-1-4614-3223-4]
[7]  
Al-Kabi MN, 2013, INT CONF INTERNET, P89, DOI 10.1109/ICIST.2013.6747511
[8]  
Alwajeeh A, 2014, 2014 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS)
[9]  
[Anonymous], 2013, P 8 WORKSH INN US NL
[10]  
[Anonymous], 2010, P 23 INT C COMPUTATI