Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX

被引:6
|
作者
Magdaleno, Damny [1 ]
Fuentes, Vett E. [1 ]
Garcia, Maria M. [1 ]
机构
[1] Univ Cent Marta Abreu de Las Villas UCLV, Comp Sci Dept, Villa Clara, Cuba
来源
COMPUTACION Y SISTEMAS | 2015年 / 19卷 / 01期
关键词
Clustering; XML; structure and content; similarity;
D O I
10.13053/CyS-19-1-1922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every day more digital data in semi-structured format are available on the World Wide Web, corporate intranets, and other media. Knowledge management using information search and processing is essential in the field of academic writing. This task becomes increasingly complex and defiant, mainly because collections of documents are usually heterogeneous, big, diverse, and dynamic. To resolve these challenges it is essential to improve management of time necessary to process scientific information. In this paper, we propose a new method of automatic clustering of XML documents based on their content and structure, as well as on a new similarity function OverallSimSUX which facilitates capturing the degree of similarity among documents. Evaluation of our proposal by means of experiments with data sets showed better results than those in previous work.
引用
收藏
页码:151 / 161
页数:11
相关论文
共 50 条
  • [1] Clustering of XML Documents Based on Structure and Aggregated Content
    Rezk, Nermeen Gamal
    Sarhan, Amany
    Algergawy, Alsaved
    PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 93 - 102
  • [2] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [3] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [4] XCLSC: Structure and Content-based Clustering of XML Documents
    Bessine, Karima
    Nehar, Attia
    Cherroun, Hadda
    Moussaoui, Abdelouahab
    2015 12TH IEEE INTERNATIONAL CONFERENCE ON PROGRAMMING AND SYSTEMS (ISPS), 2015, : 221 - 227
  • [5] XML Data Integration Based on Content and Structure Similarity Using Keys
    Viyanon, Waraporn
    Madria, Sanjay K.
    Bhowmick, Sourav S.
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008, PART I, 2008, 5331 : 484 - +
  • [6] Clustering XML Documents by Combining Content and Structure
    Guo Yongming
    Chen Dehua
    Le Jiajin
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 1, 2008, : 583 - 587
  • [7] Clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 112 - 121
  • [8] A methodology for clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    INFORMATION SYSTEMS, 2006, 31 (03) : 187 - 228
  • [9] Strategy for XML integration using similarity in structure and content
    Kim, YH
    Kim, BG
    Lee, J
    Lim, HC
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2004, E87A (06): : 1479 - 1486
  • [10] A Clustering Method Based on XML Schema Similarity
    Sun, Xia
    Wang, Hai-jun
    2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 1, 2011, : 340 - 343