Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX

被引:6
作者
Magdaleno, Damny [1 ]
Fuentes, Vett E. [1 ]
Garcia, Maria M. [1 ]
机构
[1] Univ Cent Marta Abreu de Las Villas UCLV, Comp Sci Dept, Villa Clara, Cuba
来源
COMPUTACION Y SISTEMAS | 2015年 / 19卷 / 01期
关键词
Clustering; XML; structure and content; similarity;
D O I
10.13053/CyS-19-1-1922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every day more digital data in semi-structured format are available on the World Wide Web, corporate intranets, and other media. Knowledge management using information search and processing is essential in the field of academic writing. This task becomes increasingly complex and defiant, mainly because collections of documents are usually heterogeneous, big, diverse, and dynamic. To resolve these challenges it is essential to improve management of time necessary to process scientific information. In this paper, we propose a new method of automatic clustering of XML documents based on their content and structure, as well as on a new similarity function OverallSimSUX which facilitates capturing the degree of similarity among documents. Evaluation of our proposal by means of experiments with data sets showed better results than those in previous work.
引用
收藏
页码:151 / 161
页数:11
相关论文
共 50 条
[21]   All common embedded subtrees for clustering XML documents by structure [J].
Lin, Zhiwei ;
Wang, Hui ;
McClean, Sally ;
Wang, Haiying .
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, :13-18
[22]   Measuring Similarities between XML Documents based on Content and Structure [J].
Xia, Xiaoling ;
Guo, Yongming ;
Le, Jiajin .
2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, :459-462
[23]   An improved method for classifying XML documents based on structure and content [J].
Zhang Na ;
Zhang Dongzhan ;
Yu Ye ;
Duan Jiangjiao .
THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, :426-430
[24]   Answering content and structure-based queries on XML documents using relevance propagation [J].
Sauvagnat, Karen ;
Boughanem, Mohand ;
Chrisment, Claude .
INFORMATION SYSTEMS, 2006, 31 (07) :621-635
[25]   An efficient similarity-based approach for comparing XML documents [J].
Oliveira, Alessandreia ;
Tessarolli, Gabriel ;
Ghiotto, Gleiph ;
Pinto, Bruno ;
Campello, Fernando ;
Marques, Matheus ;
Oliveira, Carlos ;
Rodrigues, Igor ;
Kalinowski, Marcos ;
Souza, Ueverton ;
Murta, Leonardo ;
Braganholo, Vanessa .
INFORMATION SYSTEMS, 2018, 78 :40-57
[26]   A Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix [J].
Zhang, Xue-Liang ;
Yang, Ting ;
Fan, Bao-Quan ;
Wang, Xu ;
Wei, Jin-Mao .
INTERNATIONAL CONFERENCE ON APPLIED PHYSICS AND INDUSTRIAL ENGINEERING 2012, PT B, 2012, 24 :1452-1461
[27]   XML Documents Clustering based on Representative Path [J].
Kim, Woosaeng .
PROCEEDINGS OF THE 13TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS, 2009, :108-+
[28]   Similarity Evaluation of XML Documents Based on Weighted Element Tree Model [J].
Wang, Chenying ;
Yuan, Xiaojie ;
Ning, Hua ;
Lian, Xin .
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 :680-687
[29]   A Research on Plagiarism Detecting Method Based on XML Similarity and Clustering [J].
Jia, Shengying ;
Liu, Dongsheng ;
Zhang, Liping ;
Liu, Chenglong .
INTERNET OF THINGS-BK, 2012, 312 :619-626
[30]   XEdge: Clustering Homogeneous and Heterogeneous XML Documents Using Edge Summaries [J].
Antonellis, Panagiotis ;
Makris, Christos ;
Tsirakis, Nikos .
APPLIED COMPUTING 2008, VOLS 1-3, 2008, :1081-1088