A reverse engineering approach for automatic annotation of Web pages

被引:0
作者
Roberto De Virgilio
Flavius Frasincar
Walter Hop
Stephan Lachner
机构
[1] Universitá Roma Tre,Dipartimento di Informatica e Automazione
[2] Erasmus University Rotterdam,Erasmus School of Economics
来源
Multimedia Tools and Applications | 2013年 / 64卷
关键词
RDFa; Rich Snippets; DRE; Web site segmentation;
D O I
暂无
中图分类号
学科分类号
摘要
The Semantic Web is gaining increasing interest to fulfill the need of sharing, retrieving, and reusing information. Since Web pages are designed to be read by people, not machines, searching and reusing information on the Web is a difficult task without human participation. To this aim adding semantics (i.e meaning) to a Web page would help the machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google’s Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide a structured approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting a data reverse engineering method, combined with several heuristics, and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets’ vocabulary, i.e., all the attributes of its Review concept, and the names of the Person and Organization concepts. We implemented tools and services and evaluated the accuracy of the approach on real E-commerce Web sites.
引用
收藏
页码:119 / 140
页数:21
相关论文
共 12 条
[1]  
Berners-Lee T(2001)The Semantic Web Sci Am 284 34-43
[2]  
Hendler J(2006)Sentiment classification of movie reviews using contextual valence shifters Comput Intell 22 110-225
[3]  
Lassila O(2002)A brief survey of web data extraction tools ACM SIGMOD Rec 31 84-93
[4]  
Kennedy A(2009)Sentiment classification of online reviews to travel destinations by supervised machine learning approaches Expert Syst Appl 36 6527-6535
[5]  
Inkpen D(undefined)undefined undefined undefined undefined-undefined
[6]  
Laender A(undefined)undefined undefined undefined undefined-undefined
[7]  
Ribeiro-Neto B(undefined)undefined undefined undefined undefined-undefined
[8]  
Silva AD(undefined)undefined undefined undefined undefined-undefined
[9]  
Teixeira JS(undefined)undefined undefined undefined undefined-undefined
[10]  
Ye Q(undefined)undefined undefined undefined undefined-undefined