ArWordVec: efficient word embedding models for Arabic tweets

被引:0
作者
Mohammed M. Fouad
Ahmed Mahany
Naif Aljohani
Rabeeh Ayaz Abbasi
Saeed-Ul Hassan
机构
[1] Fujitsu Technology Solutions,Faculty of Computers and Information Sciences
[2] Ain Shams University,Department of Computer Science
[3] Faculty of Computing and Information Technology,undefined
[4] King Abdulaziz University,undefined
[5] Quaid-i-Azam University,undefined
[6] Information Technology University,undefined
来源
Soft Computing | 2020年 / 24卷
关键词
ArWordVec; Natural language processing; Word embeddings; Deep convolution neural networks; Arabic tweets;
D O I
暂无
中图分类号
学科分类号
摘要
One of the major advances in artificial intelligence nowadays is to understand, process and utilize the humans’ natural language. This has been achieved by employing the different natural language processing (NLP) techniques along with the aid of the various deep learning approaches and architectures. Using the distributed word representations to substitute the traditional bag-of-words approach has been utilized very efficiently in the last years for many NLP tasks. In this paper, we present the detailed steps of building a set of efficient word embedding models called ArWordVec that are generated from a huge repository of Arabic tweets. In addition, a new method for measuring Arabic word similarity is introduced that has been used in evaluating the performance of the generated ArWordVec models. The experimental results show that the performance of the ArWordVec models overcomes the recently available models on Arabic Twitter data for the word similarity task. In addition, two of the large Arabic tweets datasets are used to examine the performance of the proposed models in the multi-class sentiment analysis task. The results show that the proposed models are very efficient and help in achieving a classification accuracy ratio exceeding 73.86% with a high average F1 value of 74.15.
引用
收藏
页码:8061 / 8068
页数:7
相关论文
共 30 条
[1]  
Al-Azani S(2017)Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text Procedia Comput. Sci. 109 359-366
[2]  
El-Alfy ESM(2003)A neural probabilistic language model J Mach Learn Res 3 1137-1155
[3]  
Bengio Y(2017)Precursor selection for sol–gel synthesis of titanium carbide nanopowders by a new cubic fuzzy multi-attribute group decision-making model J Intell Syst 5 4-130
[4]  
Ducharme R(2018)Trapezoidal cubic fuzzy number Einstein hybrid weighted averaging operators and its application to decision making Soft Comput 55 95-265
[5]  
Vincent P(2018)Cubic fuzzy Einstein aggregation operators and its application to decision making Int J Syst Sci 18 46-undefined
[6]  
Jauvin C(2016)How translation alters sentiment J Artif Intell Res 117 256-undefined
[7]  
Fahmi A(2018)Identification of research hypotheses and new knowledge from scientific literature BMC Med Inform Decis Mak undefined undefined-undefined
[8]  
Abdullah S(2017)AraVec: a set of Arabic word embedding models for use in Arabic NLP Procedia Comput Sci undefined undefined-undefined
[9]  
Amin F(undefined)undefined undefined undefined undefined-undefined
[10]  
Ali A(undefined)undefined undefined undefined undefined-undefined