Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX

被引:6
作者
Magdaleno, Damny [1 ]
Fuentes, Vett E. [1 ]
Garcia, Maria M. [1 ]
机构
[1] Univ Cent Marta Abreu de Las Villas UCLV, Comp Sci Dept, Villa Clara, Cuba
来源
COMPUTACION Y SISTEMAS | 2015年 / 19卷 / 01期
关键词
Clustering; XML; structure and content; similarity;
D O I
10.13053/CyS-19-1-1922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every day more digital data in semi-structured format are available on the World Wide Web, corporate intranets, and other media. Knowledge management using information search and processing is essential in the field of academic writing. This task becomes increasingly complex and defiant, mainly because collections of documents are usually heterogeneous, big, diverse, and dynamic. To resolve these challenges it is essential to improve management of time necessary to process scientific information. In this paper, we propose a new method of automatic clustering of XML documents based on their content and structure, as well as on a new similarity function OverallSimSUX which facilitates capturing the degree of similarity among documents. Evaluation of our proposal by means of experiments with data sets showed better results than those in previous work.
引用
收藏
页码:151 / 161
页数:11
相关论文
共 50 条
[31]   XML-SIM: Structure and Content Semantic Similarity Detection Using Keys [J].
Viyanon, Waraporn ;
Madria, Sanjay K. .
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2009, PT 2, 2009, 5871 :1183-1200
[32]   Towards Secure Content Based Dissemination of XML Documents [J].
Rahaman, Mohammad Ashiqur ;
Plate, Henrik ;
Roudier, Yves ;
Schaad, Andreas .
FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 2, PROCEEDINGS, 2009, :721-+
[33]   Combining structure and content similarities for XML document clustering [J].
Faculty of Information Technology, Queensland University of Technology, GPO Box 2434, Brisbane QLD 4001, Australia .
Conf. Res. Pract. Inf. Technol. Ser., 2008, (219-226)
[34]   Utilizing the Structure and Content Information for XML Document Clustering [J].
Tran, Tien ;
Kutty, Sangeetha ;
Nayak, Richi .
ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 :460-468
[35]   A METHODOLOGY FOR USING EDGES TO MEASURE STRUCTURAL AND SEMANTIC SIMILARITY OF XML DOCUMENTS [J].
Qiu, Hong-Jun ;
Yu, Wen-Jing .
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, :1653-+
[36]   Similarity Algorithm Based on Weighted Hierarchical Structure of XML Document [J].
Sun, Xia ;
Cheng, Hong-Bin ;
Wang, Hai-Jun .
2009 WASE INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING, ICIE 2009, VOL II, 2009, :143-+
[37]   A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents [J].
Periakaruppan, Ramanathan ;
Nadarajan, Rethinaswamy .
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 WORKSHOPS, 2013, 8186 :639-648
[38]   Relaxing Queries Based on XML Structure and Content Preferences [J].
Yan, Wei ;
Ma, Z. M. ;
Zhang, Fu ;
Meng, Xiangfu .
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2010 WORKSHOPS, 2011, 6724 :44-+
[39]   New Similarity Function for Scientific Articles Clustering based on the Bibliographic References [J].
Amador Penichet, Lisvandy ;
Magdaleno Guevara, Damny ;
Garcia Lorenzo, Maria Magdalena .
COMPUTACION Y SISTEMAS, 2018, 22 (01) :93-102
[40]   Costco: Robust Content and Structure Constrained Clustering of Networked Documents [J].
Yan, Su ;
Lee, Dongwon ;
Wang, Alex Hai .
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 :289-+