Understanding the Language of ISIS: An Empirical Approach to Detect Radical Content on Twitter Using Machine Learning

被引:18
作者
Ul Rehman, Zia [1 ,2 ]
Abbas, Sagheer [1 ]
Khan, Muhammad Adnan [3 ]
Mustafa, Ghulam [2 ]
Fayyaz, Hira [4 ]
Hanif, Muhammad [1 ,2 ]
Saeed, Muhammad Anwar [5 ]
机构
[1] Natl Coll Business Adm & Econ, Sch Comp Sci, Lahore 54000, Pakistan
[2] Bahria Univ, Dept Comp Sci, Lahore 54000, Pakistan
[3] Lahore Garrison Univ, Dept Comp Sci, Lahore 54000, Pakistan
[4] Univ Management & Technol, Sch Syst & Technol, Lahore 54000, Pakistan
[5] Virtual Univ Pakistan, Dept CS & IT, Lahore 54000, Pakistan
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 66卷 / 02期
关键词
Radicalization; extremism; machine learning; natural language processing; twitter; text mining; RISK;
D O I
10.32604/cmc.2020.012770
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The internet, particularly online social networking platforms have revolutionized the way extremist groups are influencing and radicalizing individuals. Recent research reveals that the process initiates by exposing vast audiences to extremist content and then migrating potential victims to confined platforms for intensive radicalization. Consequently, social networks have evolved as a persuasive tool for extremism aiding as recruitment platform and psychological warfare. Thus, recognizing potential radical text or material is vital to restrict the circulation of the extremist chronicle. The aim of this research work is to identify radical text in social media. Our contributions are as follows: (i) A new dataset to be employed in radicalization detection; (ii) In depth analysis of new and previous datasets so that the variation in extremist group narrative could be identified; (iii) An approach to train classifier employing religious features along with radical features to detect radicalization; (iv) Observing the use of violent and bad words in radical, neutral and random groups by employing violent, terrorism and bad words dictionaries. Our research results clearly indicate that incorporating religious text in model training improves the accuracy, precision, recall, and F1-score of the classifiers. Secondly a variation in extremist narrative has been observed implying that usage of new dataset can have substantial effect on classifier performance. In addition to this, violence and bad words are creating a differentiating factor between radical and random users but for neutral (anti-ISIS) group it needs further investigation.
引用
收藏
页码:1075 / 1090
页数:16
相关论文
共 44 条
  • [21] Understanding the Roots of Radicalisation on Twitter
    Fernandez, Miriam
    Asif, Moizzah
    Alani, Harith
    [J]. WEBSCI'18: PROCEEDINGS OF THE 10TH ACM CONFERENCE ON WEB SCIENCE, 2018, : 1 - 10
  • [22] Ferrara E, 2018, COMPUT SOC SCI, P229, DOI 10.1007/978-3-319-77332-2_13
  • [23] Terrorist Use of the Internet by the Numbers: Quantifying Behaviors, Patterns, and Processes
    Gill, Paul
    Corner, Emily
    Conway, Maura
    Thornton, Amy
    Bloom, Mia
    Horgan, John
    [J]. CRIMINOLOGY & PUBLIC POLICY, 2017, 16 (01) : 99 - 117
  • [24] Howell M., 2017, FIGHTING EXTREMISM E
  • [25] International Association of Chiefs of Police, 2014, ONL RAD VIOL EXTR AW
  • [26] ISIS, 2019, TERR ATT MAP TERR AT
  • [27] Jordan J., 2019, LEADERSHIP DECAPITAT
  • [28] Kaati L., 2016, 2016 IEEE international conference on cybercrime and computer forensic (ICCCF), P1, DOI [DOI 10.1109/ICCCF.2016.7740427, 10.1109/ICCCF.2016.7740427]
  • [29] Tweeting the Jihad: Social Media Networks of Western Foreign Fighters in Syria and Iraq
    Klausen, Jytte
    [J]. STUDIES IN CONFLICT & TERRORISM, 2015, 38 (01) : 1 - 22
  • [30] Koerner B. I., 2016, WIRED, V24, P76