Design and Development of Density-Based Effective Document Clustering Method Using Ontology

被引：0

作者：

Giridhar Urkude

Manju Pandey

机构：

[1] National Institute of Technology Raipur,Department of Computer Applications

来源：

Multimedia Tools and Applications | 2022年 / 81卷

关键词：

Ontology; Conventional k-means; Density-based clustering; Precision;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Text document clustering is used to separate a collection of documents into several clusters by allowing the documents in a cluster to be substantially similar. The documents in one cluster are distinct from documents in other clusters. The high-dimensional sparse document term matrix reduces the clustering process efficiency. This study proposes a new way of clustering documents using domain ontology and WordNet ontology. The main objective of this work is to increase cluster output quality. This work aims to investigate and examine the method of selecting feature dimensions to minimize the features of the document name matrix. The sports documents are clustered using conventional K-Means with the dimension reduction features selection process and density-based clustering. A novel approach named ontology-based document clustering is proposed for grouping the text documents. Three critical steps were used in order to develop this technique. The initial step for an ontology-based clustering approach starts with data pre-processing, and the characteristics of the DR method are reduced with the Info-Gain collection. The documents are clustered using two clustering methods: K-Means and Density-Based clustering with DR Feature Selection Process. These methods validate the findings of ontology-based clustering, and this study compared them using the measurement metrics. The second step of this study examines the sports field ontology development and describes the principles and relationship of the terms using sports-related documents. The semantic web rational process is used to test the ontology for validation purposes. An algorithm for the synonym retrieval of the sports domain ontology terms has been proposed and implemented. The retrieved terms from the documents and sport ontology concepts are mapped to the retrieved synonym set words from the WorldNet ontology. The suggested technique is based on synonyms of mapped concepts. The proposed ontology approach employs the reduced feature set in order to clustering the text documents. The results are compared with two traditional approaches on two datasets. The proposed ontology-based clustering approach is found to be effective in clustering the documents with high precision, recall, and accuracy. In addition, this study also compared the different RDF serialization formats for sports ontology.

引用

页码：32995 / 33015

页数：20

共 34 条

[1]

Gupta M(2016)Attribute Weighted K-means For Document Clustering Int Res J Eng Technol 03 1583-1589

[2]

Garg K(2015)A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data Adv Bioinforma 2015 691-703

[3]

Hira ZM(2018)Concept Decompositions for Short Text Clustering by Identifying Word Communities Pattern Recogn 76 737-747

[4]

Gillies DF(2015)A framework for multi-document abstractive summarization based on semantic role labelling Appl Soft Comput 30 2529-2535

[5]

Jia C(2019)Application of Floyd-Warshall’s algorithm in air freight service in Nigeria Int J Eng Res Technol 12 1-6

[6]

Carson MB(2016)An Improvised Ontology based K-Means Clustering Approach for Classification of Customer Reviews Indian J Sci Technol 9 2264-2275

[7]

Wang X(2015)Semantic Clustering of Search Engine Results Sci World J 2015 148-166

[8]

Yu J(2015)A Semantic Approach for Text Clustering Using WordNet and Lexical Chains Expert Syst Appl 42 2444-2458

[9]

Khan A(2015)A fuzzy document clustering approach based on domain-specified ontology Data Knowl Eng 100 undefined-undefined

[10]

Salim N(2018)Corpus-Based Topic Diffusion for Short Text Clustering Neurocomput 275 undefined-undefined

← 1 2 3 4 →