An Overview on XML Semantic Disambiguation from Unstructured Text to Semi-Structured Data: Background, Applications, and Ongoing Challenges

被引:39
作者
Tekli, Joe [1 ]
机构
[1] Lebanese Amer Univ, Elect & Comp Engn Dept ECE, Byblos 36, Lebanon
关键词
Content analysis and indexing; information search and retrieval; document and text processing: document and text editing; Document management; document preparation: document preparation; Markup languages; knowledge representation formalisms and methods: semantic networks; CORPUS-BASED METHODS; STRUCTURAL SIMILARITY; WEB; ALGORITHM; DOCUMENTS; RETRIEVAL; FRAMEWORK; SEARCH; CONSTRUCTION; RECOGNITION;
D O I
10.1109/TKDE.2016.2525768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the last two decades, XML has gained momentum as the standard for web information management and complex data representation. Also, collaboratively built semi-structured information resources, such as Wikipedia, have become prevalent on the Web and can be inherently encoded in XML. Yet most methods for processing XML and semi-structured information handle mainly the syntactic properties of the data, while ignoring the semantics involved. To devise more intelligent applications, one needs to augment syntactic features with machine-readable semantic meaning. This can be achieved through the computational identification of the meaning of data in context, also known as (a.k.a.) automated semantic analysis and disambiguation, which is nowadays one of the main challenges at the core of the Semantic Web. This survey paper provides a concise and comprehensive review of the methods related to XML-based semi-structured semantic analysis and disambiguation. It is made of four logical parts. First, we briefly cover traditional word sense disambiguation methods for processing flat textual data. Second, we describe and categorize disambiguation techniques developed and extended to handle semi-structured and XML data. Third, we describe current and potential application scenarios that can benefit from XML semantic analysis, including: data clustering and semantic-aware indexing, data integration and selective dissemination, semantic-aware and temporal querying, web and mobile services matching and composition, blog and social semantic network analysis, and ontology learning. Fourth, we describe and discuss ongoing challenges and future directions, including: the quantification of semantic ambiguity, expanding XML disambiguation context, combining structure and content, using collaborative/social information sources, integrating explicit and implicit semantic analysis, emphasizing user involvement, and reducing computational complexity.
引用
收藏
页码:1383 / 1407
页数:25
相关论文
共 228 条
[1]  
Agirre Eneko., 2006, Proceedings of the Conference on Empirical Methods in Natural Language Processing, P585
[2]   Scalable Semantic Analytics on Social Networks for Addressing the Problem of Conflict of Interest Detection [J].
Aleman-Meza, Boanerges ;
Nagarajan, Meenakshi ;
Ding, Li ;
Sheth, Amit ;
Arpinar, I. Budak ;
Joshi, Anupam ;
Finin, Tim .
ACM TRANSACTIONS ON THE WEB, 2008, 2 (01)
[3]   XML Data Clustering: An Overview [J].
Algergawy, Alsayed ;
Mesiti, Marco ;
Nayak, Richi ;
Saake, Gunter .
ACM COMPUTING SURVEYS, 2011, 43 (04)
[4]  
Amitay E., 2003, P ACM C INF KNOWL MA, P255
[5]  
Angelopoulou A, 2011, LECT NOTES COMPUT SC, V6966, P40, DOI 10.1007/978-3-642-24469-8_6
[6]  
[Anonymous], 2003, Proceedings of the 2003 ACM SIGMOD international conference on Management of data
[7]  
[Anonymous], 2004, Proceedings of the 12th ACM International Conference on Multimedia, DOI DOI 10.1145/1027527.1027747
[8]  
[Anonymous], 1995, ACL, DOI 10.3115/981658.981684
[9]  
[Anonymous], 1998, An information-theoretic definition of similarity
[10]  
[Anonymous], 2011, Linked data: Evolving the web into a global data space, DOI 10.2200/S00334ED1V01Y201102WBE001