BiSAL - A bilingual sentiment analysis lexicon to analyze Dark Web forums for cyber security

被引:38
作者
Al-Rowaily, Khalid [1 ]
Abulaish, Muhammad [2 ]
Haldar, Nur Al-Hasan [3 ]
Al-Rubaian, Majed [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
[2] Jamia Millia Islamia, Dept Comp Sci, New Delhi 110025, India
[3] King Saud Univ, Ctr Excellence Informat Assurance, Riyadh, Saudi Arabia
关键词
Sentiment analysis lexicon; Sentiment lexicon for English; Sentiment lexicon for Arabic; Cyber security; Dark Web forum;
D O I
10.1016/j.diin.2015.07.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present the development of a Bilingual Sentiment Analysis Lexicon (BiSAL) for cyber security domain, which consists of a Sentiment Lexicon for ENglish (SentiLEN) and a Sentiment Lexicon for ARabic (SentiLAR) that can be used to develop opinion mining and sentiment analysis systems for bilingual textual data from Dark Web forums. For SentiLEN, a list of 279 sentiment bearing English words related to cyber threats, radicalism, and conflicts are identified and a unifying process is devised to unify their sentiment scores obtained from four different sentiment data sets. Whereas, for SentiLAR, sentiment bearing Arabic words are identified from a collection of 2000 message posts from Alokab Web forum, which contains radical contents. The SentiLAR provides a list of 1019 sentiment bearing Arabic words related to cyber threats, radicalism, and conflicts along with their morphological variants and sentiment polarity. For polarity determination, a semi-automated analysis process by three Arabic language experts is performed and their ratings are aggregated using some aggregate functions. A Web interface is developed to access both the lexicons (SentiLEN and SentiLAR) of BiSAL data set online, and a beta version of the same is available at http://www.abulaish.com/ bisal. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 17 条
[1]  
Analyst's desktop binder, 2011, ANAL DESKTOP BINDER
[2]  
Annett M, 2008, LECT NOTES ARTIF INT, V5032, P25
[3]  
[Anonymous], TEXT ANAL SOCIAL SCI
[4]   A social graph based text mining framework for chat log investigation [J].
Anwar, Tarique ;
Abulaish, Muhammad .
DIGITAL INVESTIGATION, 2014, 11 (04) :349-362
[5]  
Aronoff Mark, 1994, Morphology by itself. Stems and inflectional classes
[6]   Clustering digital forensic string search output [J].
Beebe, Nicole L. ;
Liu, Lishu .
DIGITAL INVESTIGATION, 2014, 11 (04) :314-322
[7]   Ranking algorithms for digital forensic string search hits [J].
Beebe, Nicole Lang ;
Liu, Lishu .
DIGITAL INVESTIGATION, 2014, 11 :S124-S132
[8]  
Chen H, 2012, INTEGR SER INFORM SY, V30, P1, DOI 10.1007/978-1-4614-1557-2
[9]  
Esuli Andrea., 2006, LREC 2006 Proceedings, 2006, S, P417
[10]  
Hsinchun Chen, 2011, 2011 IEEE International Conference on Intelligence and Security Informatics (ISI 2011), P7, DOI 10.1109/ISI.2011.5984042