Text clustering with local semantic kernels

被引:10
|
作者
AlSumait, Loulwah [1 ]
Domeniconi, Carlotta [1 ]
机构
[1] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
关键词
D O I
10.1007/978-1-84800-046-9_5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to capture local feature relevance, and to group documents with respect to the features (or words) that matter the most. This chapter presents a subspace clustering technique based on a locally adaptive clustering (LAC) algorithm. To improve the subspace clustering of documents and the identification of keywords achieved by LAC, kernel methods and semantic distances are deployed. The basic idea is to define a local kernel for each cluster by which semantic distances between pairs of words are computed to derive the clustering and local term weightings. The proposed approach, called semantic LAC, is evaluated using benchmark datasets. Our experiments show that semantic LAC is capable of improving the clustering quality.
引用
收藏
页码:87 / 105
页数:19
相关论文
共 50 条
  • [31] Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets
    Mehta, Vivek
    Bawa, Seema
    Singh, Jasmeet
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [32] Local kernels based graph learning for multiple kernel clustering
    Liu, Zheng
    Huang, Shiluo
    Jin, Wei
    Mu, Ying
    PATTERN RECOGNITION, 2024, 150
  • [33] Optimal Neighborhood Multiple Kernel Clustering With Adaptive Local Kernels
    Liu, Jiyuan
    Liu, Xinwang
    Xiong, Jian
    Liao, Qing
    Zhou, Sihang
    Wang, Siwei
    Yang, Yuexiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2872 - 2885
  • [34] Research on Text Clustering Based On Web Concept Semantic Tree
    Yang Xiquan
    Dai Shu
    Zheng Dan
    DCABES 2008 PROCEEDINGS, VOLS I AND II, 2008, : 863 - 867
  • [35] Genetic algorithm for text clustering based on latent semantic indexing
    Song, Wei
    Park, Soon Cheol
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (11-12) : 1901 - 1907
  • [36] A semantic approach for text clustering using WordNet and lexical chains
    Wei, Tingting
    Lu, Yonghe
    Chang, Huiyou
    Zhou, Qiang
    Bao, Xianyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (04) : 2264 - 2275
  • [37] Semantic Clustering and Convolutional Neural Network for Short Text Categorization
    Wang, Peng
    Xu, Jiaming
    Xu, Bo
    Liu, Cheng-Lin
    Zhang, Heng
    Wang, Fangyuan
    Hao, Hongwei
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 352 - 357
  • [38] Semantic string operation for specializing AHC algorithm for text clustering
    Jo, Taeho
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2020, 88 (10) : 1083 - 1100
  • [39] A WordNet-based Semantic Model for Enhancing Text Clustering
    Shehata, Shady
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 477 - 482
  • [40] Research on The parallel Text Clustering Algorithm Based on the Semantic Tree
    Liu, Gangfeng
    Wang, Yunlan
    Zhao, Tianhai
    Li, Dongyang
    2011 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY (ICCIT), 2012, : 400 - 403