A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

被引:11
|
作者
Rashid, Junaid [1 ]
Shah, Syed Muhammad Adnan [1 ]
Irtaza, Aun [1 ]
机构
[1] Univ Engn & Technol, Dept Comp Sci, Taxila, Pakistan
关键词
Topic modeling; bag-of-words; term weighting; fuzzy k-means; principal component analysis; SYSTEM;
D O I
10.3233/JIFS-182776
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical and health text documents pose a challenge for data handling and retrieving the relevant and meaningful documents. Automatically retrieval of significant knowledge with a better understanding of medical and health documents is a challenging task. One popular approach for thematically understand the medical and health text documents and finding the topics from these documents is topic modeling. In this research, we propose a novel topic modeling approach Fuzzy k-means latent semantic analysis (FKLSA) by using the fuzzy clustering. Our method generates local and global term frequencies through the bag of words (BOW) model. Principal component analysis is used for removing high dimensionality negative impact on global term weighting. Previous work shows that in medical and health documents redundancy issue has a negative impact on the quality of text mining Therefore, the main achievement of FKLSA is the handling of the redundancy issue in medical and text documents and discover semantically more precise topics. FKLSA is socially utilized for finding the themes from medical and health text corpus. These topics are further used for text classification and clustering tasks in text mining Experimental results show that FKLSA performs better than LDA and RedLDA for redundant corpora. FKLSA's time performance is also stable with an increase in number of topics and thus better than LDA and LSA on a big twitter heath dataset. Quantitative evaluations of the real-world dataset for health and medical documents show that FKLSA gives a higher performance as compared to state-of-the-art topic models like Latent Dirichlet allocation and Latent semantic analysis.
引用
收藏
页码:6573 / 6588
页数:16
相关论文
共 7 条
  • [1] Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering
    Rashid, Junaid
    Shah, Syed Muhammad Adnan
    Irtaza, Aun
    Mahmood, Toqeer
    Nisar, Muhammad Wasif
    Shafiq, Muhammad
    Gardezi, Akber
    IEEE ACCESS, 2019, 7 : 146070 - 146080
  • [2] An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering
    Rashid, Junaid
    Shah, Syed Muhammad Adnan
    Irtaza, Aun
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2020, 39 (01) : 213 - 222
  • [3] Fuzzy topic modeling approach for text mining over short text
    Rashid, Junaid
    Shah, Syed Muhammad Adnan
    Irtaza, Aun
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [4] Fuzzy Approach Topic Discovery in Health and Medical Corpora
    Karami, Amir
    Gangopadhyay, Aryya
    Zhou, Bin
    Kharrazi, Hadi
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2018, 20 (04) : 1334 - 1345
  • [5] A Novel Text Clustering Method Based on TGSOM and Fuzzy K-Means
    Hu, Jinzhu
    Xiong, Chunxiu
    Shu, Jiangbo
    Zhou, Xing
    Zhu, Jun
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 26 - 30
  • [6] Fuzzy K-Means and Principal Component Analysis for Classifying Soil Properties for Efficient Farm Management and Maintaining Soil Health
    Shukla, Manoj K.
    Sharma, Parmodh
    SUSTAINABILITY, 2023, 15 (17)
  • [7] A hybrid machine learning approach of fuzzy-rough-k-nearest neighbor, latent semantic analysis, and ranker search for efficient disease diagnosis
    Jha, Sunil Kumar
    Marina, Ninoslav
    Wang, Jinwei
    Ahmad, Zulfiqar
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (03) : 2549 - 2563