Exploring dictionary-based semantic relatedness in labeled tree data

被引:14
作者
Tagarelli, Andrea [1 ]
机构
[1] Univ Calabria, Dept Elect Comp & Syst Sci DEIS, I-87030 Commenda Di Rende, Italy
关键词
Semantic relatedness measures; Tree-shaped semistructured data and XML; Structural sense ranking; PageRank; WordNet; WORD SENSE DISAMBIGUATION; XML SCHEMAS; SIMILARITY;
D O I
10.1016/j.ins.2012.07.038
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increase in the volume and heterogeneity of semistructured data based application scenarios has demanded for next-generation methods that are able to effectively couple syntactic with semantic information in data management and mining tasks. The focus of this paper is on the development of methods for determining semantic relatedness in tree-shaped semistructured data and on the assessment of the impact of these methods on structural sense ranking in such data. By exploiting key features of a lexical knowledge base like WordNet, namely ontological relations and concept definitions, we propose a twofold approach that takes into account the particular form of labeled tree data as a conceptual hierarchical representation of real-world objects. We infer indirect relationships between tag concepts and exploit an interleaved search through different concept hierarchies in order to extend semantic relatedness measures originally conceived for plain-text data to deal with labeled tree data instances. We also develop a structural sense ranking framework which employs a context graph built on the tag concepts and the structural relations among tags in the tree data. Experimental evidence on a large real-world collection of Wikipedia articles has shown that the proposed methods can effectively detect and maximize semantic relatedness in tree-structured data, and can be profitably used to perform structural sense ranking. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:244 / 268
页数:25
相关论文
共 68 条
[1]  
Aggarwal CC, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P46
[2]  
Agirre E, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P1388
[3]   Element similarity measures in XML schema matching [J].
Algergawy, Alsayed ;
Nayak, Richi ;
Saake, Gunter .
INFORMATION SCIENCES, 2010, 180 (24) :4975-4998
[4]  
Anaya-Sanchez H, 2006, LECT NOTES COMPUT SC, V4140, P472
[5]  
[Anonymous], 2007, P 4 INT WORKSH SEM E
[6]  
[Anonymous], 2009, Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics, DOI DOI 10.3115/1609067.1609070
[7]  
[Anonymous], NATURAL LANGUAGE ENG
[8]  
[Anonymous], 2001, P 12 EUR C MACH LEAR, DOI DOI 10.1007/3-540-44795-4_42
[9]  
Antonellis P, 2008, APPLIED COMPUTING 2008, VOLS 1-3, P1081
[10]  
Banerjee S., 2003, P 18 INT JOINT C ART, P805