Novel Similarity Measure for Document Clustering Based on Topic Phrases

被引:0
|
作者
ELdesoky, A. E. [1 ]
Saleh, M. [2 ]
Sakr, N. A. [1 ]
机构
[1] Mansoura Univ, Dept Comp & Syst, Mansoura, Egypt
[2] King Abdulaziz Univ, Dept Comp Syst, Jeddah, Saudi Arabia
来源
ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE | 2007年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is a subset of the data clustering field which categorizes large set of documents into similar and related groups. In the traditional Vector Space Model (VSM) researchers have considered the unique word which occurs in the document set as the candidate feature. Recently a new trend which considered the phrase to be a more informative feature has taken place; the matter which contributes in improving the document clustering accuracy and effectiveness. This paper proposes a new approach for computing the similarity measure of the traditional VSM by considering the topic phrases of the document as the constituting terms for the VSM instead of the traditional term "word" and applying the new approach to the Buckshot method, which is a mix of the Hierarchical Agglomerative Clustering (HAC) algorithm and the K-means partitioning algorithm. Such a mechanism may raise the effectiveness of the clustering by increasing the evaluation metrics values.
引用
收藏
页码:92 / +
页数:2
相关论文
共 50 条
  • [1] Document Similarity Measure Based on Topic Model
    He, Ming
    Wang, Zhen-zhen
    Du, Yong-ping
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1280 - 1284
  • [2] Hierarchical Document Clustering based on Cosine Similarity measure
    Popat, Shraddha K.
    Deshmukh, Pramod B.
    Metre, Vishakha A.
    2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 153 - 159
  • [3] Korean document summarization using topic phrases extraction and locality-based similarity
    Ryu, J
    Han, KR
    Rim, KW
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2003, 2871 : 320 - 325
  • [4] Topic Model Based Text Similarity Measure for Chinese Judgment Document
    Wang, Yue
    Ge, Jidong
    Zhou, Yemao
    Feng, Yi
    Li, Chuanyi
    Li, Zhongjin
    Zhou, Xiaoyu
    Luo, Bin
    DATA SCIENCE, PT II, 2017, 728 : 42 - 54
  • [5] Affinity-based similarity measure for web document clustering
    Shyu, ML
    Chen, SC
    Chen, M
    Rubin, SH
    PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 247 - 252
  • [6] Document Clustering in Correlation Similarity Measure Space
    Zhang, Taiping
    Tang, Yuan Yan
    Fang, Bin
    Xiang, Yong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 1002 - 1013
  • [7] A Novel Graph Based Clustering Approach to Document Topic Modeling
    Chanda, Prateek
    Das, Asit Kumar
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [8] Designing a Semantic Similarity Measure for Biomedical Document Clustering
    Logeswari, S.
    Kandhasamy, Premalatha
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2015, 5 (06) : 1163 - 1170
  • [9] An Intelligent Similarity Measure for Effective Text Document Clustering
    Aishwarya, M. L.
    Selvi, K.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [10] Multi-viewpoint Based Similarity Measure and Optimality Criteria for Document Clustering
    Duc Thang Nguyen
    Chen, Lihui
    Chan, Chee Keong
    INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 49 - 60