Clustering of biomedical documents using ontology-based TF-IGM enriched semantic smoothing model for telemedicine applications

被引:0
作者
R. Sandhiya
M. Sundarambal
机构
[1] Coimbatore Institute of Technology,Department of Information Technology
[2] Coimbatore Institute of Technology,Department of Electrical and Electronics Engineering
来源
Cluster Computing | 2019年 / 22卷
关键词
Document clustering; Telemedicine; n-gram; Mesh ontology; Semantic smoothing; Term frequency; k-means; Hierarchical clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering of biomedical documents has become a vital research concept due to its importance in the clinical and telemedicine applications. The clustering of the medical documents is being considered as a major issue because of its unstructured nature. This paper focuses on developing an efficient document clustering approach for the medical documents to be utilized in telemedicine applications. Most existing models utilize n-gram techniques for phrase identification and term, concept or semantic based models for clustering applications. However n-gram does not perform well when the original document has been modified while only hybrid models provide relatively improved clustering. The proposed document clustering approach is named as enriched semantic smoothing model which has been developed on the concept of Mesh ontology. As the semantic smoothing model is not effective in handling the density of general words, an improved model with term frequency and inverse gravity moment (TF-IGM) factor and improved background elimination is used. Unlike term frequency and inverse document frequency), TF-IGM precisely measure the class distinguishing power of a term by making use of the fine-grained term distribution across different classes of text in documents. The modified n-gram technique, which detects the cases of substitution and deletion in the documents and averts them, improves the phrases identification. The clustering efficiency of the k-means clustering and hierarchical clustering algorithms is improved by utilizing the proposed model. The experiments are made on Mesh ontology based PubMed documents with similarity measures and cluster validity indexes used for comparisons. The results show that the proposed approach of medical document clustering is highly accurate and thus improves the concepts of clinical practices and telemedicine.
引用
收藏
页码:3213 / 3230
页数:17
相关论文
共 86 条
  • [1] Chim H(2008)Efficient phrase-based document similarity for clustering IEEE Trans. Knowl. Data Eng. 20 1217-1229
  • [2] Deng X(2002)Uniqueness of medical data mining Artif. Intell. Med. 26 1-24
  • [3] Cios KJ(1999)Data clustering: a review ACM Comput. Surv. (CSUR) 31 264-323
  • [4] Moore GW(2015)Ontology-based semantic smoothing model for biomedical document clustering Int. J. Telemed. Clin. Pract. 1 94-110
  • [5] Jain AK(2011)Relationship matrix nonnegative decomposition for clustering Math. Probl. Eng. 2011 842325-266
  • [6] Murty MN(2011)A new fuzzy clustering algorithm based on clonal selection for land cover classification Math. Probl. Eng. 2011 253-1941
  • [7] Flynn PJ(2006)Exploring supervised and unsupervised methods to detect topics in biomedical text BMC Bioinform. 7 140-69
  • [8] Logeswari S(2007)PubMed related articles: a probabilistic topic-based model for content similarity BMC Bioinform. 8 423-428
  • [9] Premalatha K(2008)PuReD-MCL: a graph-based PubMed document clustering methodology Bioinformatics 24 1935-1951
  • [10] Pan JY(2004)The MeSH translation maintenance system: structure, interface design, and implementation Stud. Health Technol. Inf. 11 67-S154