An optimized deep learning approach for suicide detection through Arabic tweets

被引:15
作者
Baghdadi, Nadiah A. [1 ]
Malki, Amer [2 ]
Balaha, Hossam Magdy [3 ]
AbdulAzeem, Yousry [4 ]
Badawy, Mahmoud [3 ]
Elhosseini, Mostafa [2 ,3 ]
机构
[1] Princess Nourah Bint Abdulrahman Univ, Coll Nursing, Nursing Management & Educ Dept, Riyadh, Saudi Arabia
[2] Taibah Univ, Coll Comp Sci & Engn, Yanbu, Saudi Arabia
[3] Mansoura Univ, Fac Engn, Comp & Control Syst Engn Dept, Mansoura, Egypt
[4] Misr Higher Inst Engn & Technol, Comp Engn Dept, Mansoura, Egypt
关键词
Deep Learning (DL); Machine Learning (ML); Suicide; Twitter; DEPRESSION DETECTION; MODEL;
D O I
10.7717/peerj-cs.1070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many people worldwide suffer from mental illnesses such as major depressive disorder (MDD), which affect their thoughts, behavior, and quality of life. Suicide is regarded as the second leading cause of death among teenagers when treatment is not received. Twitter is a platform for expressing their emotions and thoughts about many subjects. Many studies, including this one, suggest using social media data to track depression and other mental illnesses. Even though Arabic is widely spoken and has a complex syntax, depressive detection methods have not been applied to the language. The Arabic tweets dataset should be scraped and annotated first. Then, a complete framework for categorizing tweet inputs into two classes (such as Normal or Suicide) is suggested in this study. The article also proposes an Arabic tweet preprocessing algorithm that contrasts lemmatization, stemming, and various lexical analysis methods. Experiments are conducted using Twitter data scraped from the Internet. Five different annotators have annotated the data. Performance metrics are reported on the suggested dataset using the latest Bidirectional Encoder Representations from Transformers (BERT) and Universal Sentence Encoder (USE) models. The measured performance metrics are balanced accuracy, specificity, F1-score, IoU, ROC, Youden Index, NPV, and weighted sum metric (WSM). Regarding USE models, the best-weighted sum metric (WSM) is 80.2%, and with regards to Arabic BERT models, the best WSM is 95.26%.
引用
收藏
页数:26
相关论文
共 55 条
  • [1] All-in-One: Emotion, Sentiment and Intensity Prediction Using a Multi-Task Ensemble Framework
    Akhtar, Md Shad
    Ghosal, Deepanway
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Kurohashi, Sadao
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (01) : 285 - 297
  • [2] Prediction of depressed Arab women using their tweets
    Alabdulkreem, Eatedal
    [J]. JOURNAL OF DECISION SYSTEMS, 2021, 30 (2-3) : 102 - 117
  • [3] Machine Learning-Based Approach for Depression Detection in Twitter Using Content and Activity Features
    Alsagri, Hatoon S.
    Ykhlef, Mourad
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (08): : 1825 - 1832
  • [4] [Anonymous], 2022, NLTK:: Natural Language Toolkit
  • [5] Antoun W., 2021, P 6 AR NAT LANG PROC, P191
  • [6] Antoun Wissam, 2020, P 4 WORKSH OP SOURC, P9
  • [7] Sentiment analysis of extremism in social media from textual information
    Asif, Muhammad
    Ishtiaq, Atiab
    Ahmad, Haseeb
    Aljuaid, Hanan
    Shah, Jalal
    [J]. TELEMATICS AND INFORMATICS, 2020, 48
  • [8] A3C-TL-GTO: Alzheimer Automatic Accurate Classification Using Transfer Learning and Artificial Gorilla Troops Optimizer
    Baghdadi, Nadiah A.
    Malki, Amer
    Balaha, Hossam Magdy
    Badawy, Mahmoud
    Elhosseini, Mostafa
    [J]. SENSORS, 2022, 22 (11)
  • [9] An automated diagnosis and classification of COVID-19 from chest CT images using a transfer learning-based convolutional neural network
    Baghdadi, Nadiah A.
    Malki, Amer
    Abdelaliem, Sally F.
    Balaha, Hossam Magdy
    Badawy, Mahmoud
    Elhosseini, Mostafa
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 144
  • [10] Bhushan Bharat, 2021, International Conference on Innovative Computing and Communications. Proceedings of ICICC 2020. Advances in Intelligent Systems and Computing (AISC 1166), P377, DOI 10.1007/978-981-15-5148-2_34