Medical dataset classification for Kurdish short text over social media

被引:6
作者
Saeed, Ari M. [1 ]
Hussein, Shnya R. [1 ]
Ali, Chro M. [1 ]
Rashid, Tarik A. [2 ]
机构
[1] Univ Halabja, Comp Sci Dept, KRG, Kurdistan, Kurdistan, Iraq
[2] Univ Kurdistan Hawler, Comp Sci & Engn Dept, KRG, Erbil, Kurdistan, Iraq
来源
DATA IN BRIEF | 2022年 / 42卷
关键词
Machine learning; Medical text classification; Kurdish short text; Text pre-processing;
D O I
10.1016/j.dib.2022.108089
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%. (C) 2022 The Author(s). Published by Elsevier Inc.
引用
收藏
页数:10
相关论文
共 9 条
  • [1] An extensive dataset of handwritten central Kurdish isolated characters
    Ahmed, Rebin M.
    Rashid, Tarik A.
    Fatah, Polla
    Alsadoon, Abeer
    Mirjalili, Seyedali
    [J]. DATA IN BRIEF, 2021, 39
  • [2] Social media competitive analysis and text mining: A case study in the pizza industry
    He, Wu
    Zha, Shenghua
    Li, Ling
    [J]. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2013, 33 (03) : 464 - 472
  • [3] Meena R, 2019, P 3 INT C ISMAC IOT
  • [4] A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter
    Naseem, Usman
    Razzak, Imran
    Eklund, Peter W.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35239 - 35266
  • [5] Rashid T, 2017, ADV INTERNETWORKING
  • [6] Rashid Tarik A., 2017, Information Technology Journal, V16, P27, DOI 10.3923/itj.2017.27.34
  • [7] Saeed A., 2018, UKH J SCI ENG, DOI [DOI 10.25079/UKHJSE.V2N1Y2018.PP48-54, 10.25079/ukhjse.v2n1y2018, DOI 10.25079/UKHJSE.V2N1Y2018]
  • [8] Saranya G, 2020, P INT C POWER ENERGY
  • [9] Federated Machine Learning: Concept and Applications
    Yang, Qiang
    Liu, Yang
    Chen, Tianjian
    Tong, Yongxin
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2019, 10 (02)