A SEMANTIC SCRAPING MODEL FOR WEB RESOURCES Applying Linked Data to Web Page Screen Scraping

被引:0
|
作者
Ignacio Fernandez-Villamor, Jose [1 ]
Blasco-Garcia, Jacobo [1 ]
Iglesias, Carlos A. [1 ]
Garijo, Mercedes [1 ]
机构
[1] Univ Politecn Madrid, Dept Ingn Sistemas Telemat, Madrid, Spain
关键词
Information extraction; Linked data; Screen scraping;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies.
引用
收藏
页码:451 / 456
页数:6
相关论文
共 50 条
  • [1] Phishing Web Page Detection using Web Scraping
    Boyapati, Mallika
    Aygun, Ramazan
    SOUTHEASTCON 2023, 2023, : 167 - 174
  • [2] Web scraping proxy
    Katseff, HP
    DR DOBBS JOURNAL, 2003, 28 (06): : 46 - +
  • [3] Web Scraping for Astronomy
    Derriere, S.
    Boch, T.
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXI, 2012, 461 : 319 - 322
  • [4] The Value of Web Data Scraping: An Application to TripAdvisor
    Barbera, Gianluca
    Araujo, Luiz
    Fernandes, Silvia
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)
  • [5] Rousillon: Scraping Distributed Hierarchical Web Data
    Chasins, Sarah E.
    Mueller, Maria
    Bodik, Rastislav
    UIST 2018: PROCEEDINGS OF THE 31ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2018, : 963 - 975
  • [6] Anwendungen des Web Scraping in der amtlichen StatistikApplications for web scraping in official statistics
    Heidi Kühnemann
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2021, 15 (1) : 5 - 25
  • [7] Web Scraping Using R
    Bradley, Alex
    James, Richard J. E.
    ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE, 2019, 2 (03) : 264 - 270
  • [8] Scraping the demos. Digitalization, web scraping and the democratic project
    Ulbricht, Lena
    DEMOCRATIZATION, 2020, 27 (03) : 426 - 442
  • [9] Effective Web Scraping with OXPath
    Grasso, Giovanni
    Furche, Tim
    Schallhart, Christian
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 23 - 25
  • [10] Cloud Based Web Scraping for Big Data Applications
    Chaulagain, Ram Sharan
    Pandey, Santosh
    Basnet, Sadhu Ram
    Shakya, Subarna
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2017, : 138 - 143