Clustering web documents using hierarchical representation with multi-granularity

被引:11
|
作者
Huang, Faliang [1 ]
Zhang, Shichao [2 ,5 ]
He, Minghua [3 ]
Wu, Xindong [4 ]
机构
[1] Fujian Normal Univ, Fac Software, Fuzhou 350007, Peoples R China
[2] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[3] Aston Univ, Birmingham B4 7ET, Aston Triangle, England
[4] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[5] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2014年 / 17卷 / 01期
基金
澳大利亚研究理事会;
关键词
web document clustering; hierarchical representation; multi-granularity; INFORMATION GRANULATION;
D O I
10.1007/s11280-012-0197-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with "false correlation". In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a two-phase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problem resulted from the sparse term-paragraph matrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerance-rough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
引用
收藏
页码:105 / 126
页数:22
相关论文
共 50 条
  • [21] MGNR: A Multi-Granularity Neighbor Relationship and Its Application in KNN Classification and Clustering Methods
    Xie, Jiang
    Xiang, Xuexin
    Xia, Shuyin
    Jiang, Lian
    Wang, Guoyin
    Gao, Xinbo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7956 - 7972
  • [22] The construction of multi-granularity concept lattices
    Hu, Qian
    Qin, Ke-Yun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2783 - 2790
  • [23] A New Hierarchical Multi-granularity Cross-domain Addressing Approach in Datalink Networks
    Li, Chunfeng
    Wang, Zhenlei
    Wu, Xiongjun
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 430 - 435
  • [24] Bridging the gap: multi-granularity representation learning for text-based vehicle retrieval
    Bo, Xue
    Liu, Junjie
    Yang, Di
    Ma, Wentao
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [25] Multi-granularity representation learning for sketch-based dynamic face image retrieval
    Wang, Liang
    Dai, Dawei
    Fu, Shiyu
    APPLIED INTELLIGENCE, 2025, 55 (01)
  • [26] MultiGranDTI: an explainable multi-granularity representation framework for drug-target interaction prediction
    Gong, Xu
    Liu, Qun
    He, Jing
    Guo, Yike
    Wang, Guoyin
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [27] Multi-granularity evolution analysis of software using complex network theory
    Weifeng Pan
    Bing Li
    Yutao Ma
    Jing Liu
    Journal of Systems Science and Complexity, 2011, 24 : 1068 - 1082
  • [28] Two novel multi-granularity optical cross-connect architectures for hierarchical optical networks
    Qi, Yongmin
    Tian, Xiangqing
    Jin, Yaohui
    Hu, Weisheng
    OPTICAL TRANSMISSION, SWITCHING, AND SUBSYSTEMS IV, PTS 1 AND 2, 2006, 6353
  • [29] Multi-granularity evolution analysis of software using complex network theory
    Pan, Weifeng
    Li, Bing
    Ma, Yutao
    Liu, Jing
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2011, 24 (06) : 1068 - 1082
  • [30] Few-shot learning based on hierarchical classification via multi-granularity relation networks
    Su, Yuling
    Zhao, Hong
    Lin, Yaojin
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2022, 142 : 417 - 429