Frequent Term Based Text Document Clustering Using Similarity Measures: A Novel Approach

被引:0
|
作者
Gupta, Vijay Kumar [1 ]
Dutta, Maitreyee [2 ]
Kumar, Manoj [3 ]
机构
[1] Govt Girls Polytech, Dept IT, Charkhari, Mahoba, India
[2] NITTTR, Dept CS&E, Chandigarh, India
[3] BBDNITM, Dept IT, Lucknow, Uttar Pradesh, India
关键词
Clustering; Data Mining; Cosine Similarity; Similarity Index; Fuzzy Logic; Support Vector Machine; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering is one of the epic and traditional ways to make sure that the documents are retrieved at the right pace and according to the requirement. Clustering leads to keeping the similar kind of documents all together and so that they can be retrieved easily. The measure through which the relation between two documents is measured is called similarity index. There are several kind of similarity index already in the process. The proposed algorithm uses two kind of similarity index and combines them to produce a new similarity index. Similarity index plays a vital role in the clustering and classification procedure. The proposed algorithm also uses Fuzzy logic for the clustering rules and furthermore it is classified by the Support Vector Machine to justify the accuracy of the proposed solution.
引用
收藏
页码:164 / 169
页数:6
相关论文
共 50 条
  • [21] Fusion Matrix–Based Text Similarity Measures for Clustering of Retrieval Results
    Yueyang Zhao
    Lei Cui
    Scientometrics, 2023, 128 : 1163 - 1186
  • [22] An Intelligent Similarity Measure for Effective Text Document Clustering
    Aishwarya, M. L.
    Selvi, K.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [23] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [24] Text clustering using frequent itemsets
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    Wang, Qing
    KNOWLEDGE-BASED SYSTEMS, 2010, 23 (05) : 379 - 388
  • [25] A novel ant-based clustering approach for document clustering
    He, Yulan
    Hui, Sin Cheung
    Sim, Yongxiang
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 537 - 544
  • [26] Document Clustering Using K-Means with Term Weighting as Similarity-Based Constraints
    Buatoom, Uraiwan
    Kongprawechnon, Waree
    Theeramunkong, Thanaruk
    SYMMETRY-BASEL, 2020, 12 (06):
  • [27] Text-based Document Similarity Matching Using sdtext
    Shields, Clay
    PROCEEDINGS OF THE 49TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS 2016), 2016, : 5607 - 5616
  • [28] An Active Learning Approach to Frequent Itemset-Based Text Clustering
    Marcacini, Ricardo M.
    Correa, Geraldo N.
    Rezende, Solange O.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3529 - 3532
  • [29] Document Clustering Based on Fuzzy Similarity
    Zhou, Jingli
    Nie, Xuejun
    Qin, Leihua
    Zhu, Jianfeng
    APPLIED MECHANICS AND MECHANICAL ENGINEERING, PTS 1-3, 2010, 29-32 : 2620 - 2626
  • [30] Fusion Matrix-Based Text Similarity Measures for Clustering of Retrieval Results
    Zhao, Yueyang
    Cui, Lei
    SCIENTOMETRICS, 2023, 128 (02) : 1163 - 1186