An embedded feature selection approach for depression classification using short text sequences

被引:1
作者
Priya, S. Kavi [1 ]
Karthika, K. Pon [1 ]
机构
[1] Mepco Schlenk Engn Coll, Dept Comp Sci & Engn, Sivakasi 626005, Tamil Nadu, India
关键词
Depression detection; Composite feature selection; Dimensionality reduction; Whale optimization algorithm; Text classification; OPTIMIZATION; CATEGORIZATION; INFORMATION; SEARCH;
D O I
10.1016/j.asoc.2023.110828
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depression has become a serious mental health issue worldwide, particularly due to the rise of the Global Pandemic. Identifying depression of an individual from short texts shared in social media is a challenging task. The present work aims to select the optimal feature subset for classifying short texts for depression detection. By performing feature selection, it is possible to eliminate redundant and noisy features in high-dimensional datasets with small sample sizes. This can prevent the "curse of dimensionality" and enhance the effectiveness of classification algorithms. However, current feature selection methods often focus on optimizing classification or clustering performance, while neglecting the stability of the selected features. This can lead to unstable results and make it challenging to identify meaningful and interpretable features. This paper introduces a novel embedded feature selection approach named Statistical Relevance Class Frequency based on Whale Optimization Algorithm (SRCF-WOA) for selecting feature subsets from short texts in social media. The proposed methodology extracts both the unigram features and composite features to capture the semantic and structural information. chi 2.rcf (Chi-squared relevance class frequency) filter approach is applied to rank the extracted features to signify the importance of the features. WOA is adapted to retrieve the optimal subset of features with low-dimensional space using its high exploration and high exploitation capability. In the evaluation process, four benchmark short text datasets and two classifiers are used. The comparison shows that the proposed embedded feature selection method outperforms other algorithms in terms of accuracy and F beta scores(beta = 0.5, 1, and 2). The sensitivity analysis is carried out to check the robustness and stability of the proposed method. The findings indicate that the SRCF-WOA surpasses other methods on the majority of datasets, achieving the maximum classification accuracy while utilizing the minimal features. The statistical importance of these findings is further supported by the Analysis of Variance (ANOVA) F-test. Moreover, the proposed method strikes the optimal balance between classification accuracy and feature stability.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 63 条
  • [1] Kepler optimization algorithm: A new metaheuristic algorithm inspired by Kepler?s laws of planetary motion
    Abdel-Basset, Mohamed
    Mohamed, Reda
    Azeem, Shaimaa A. Abdel
    Jameel, Mohammed
    Abouhawwash, Mohamed
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 268
  • [2] Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019)
    Agrawal, Prachi
    Abutarboush, Hattan F.
    Ganesh, Talari
    Mohamed, Ali Wagdy
    [J]. IEEE ACCESS, 2021, 9 : 26766 - 26791
  • [3] Aich P, 2018, 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA)
  • [4] Plant intelligence based metaheuristic optimization algorithms
    Akyol, Sinem
    Alatas, Bilal
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2017, 47 (04) : 417 - 462
  • [5] A Physics Based Novel Approach for Travelling Tournament Problem: Optics Inspired Optimization
    Alatas, B.
    Bingol, H.
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2019, 48 (03): : 373 - 388
  • [6] COMPARATIVE ASSESSMENT OF LIGHT-BASED INTELLIGENT SEARCH AND OPTIMIZATION ALGORITHMS
    Alatas, Bilal
    Bingol, Harun
    [J]. LIGHT & ENGINEERING, 2020, 28 (06): : 51 - 59
  • [7] Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
    Albalawi, Rania
    Yeap, Tet Hin
    Benyoucef, Morad
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3
  • [8] Albathan M., 2013, Adv. Artif. Intell., V8272, P453
  • [9] Review of short-text classification
    Alsmadi, Issa
    Gan, Keng Hoon
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (02) : 155 - 182
  • [10] Big data analytics on social networks for real-time depression detection
    Angskun, Jitimon
    Tipprasert, Suda
    Angskun, Thara
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)