A SEMANTIC SCRAPING MODEL FOR WEB RESOURCES Applying Linked Data to Web Page Screen Scraping

被引:0
|
作者
Ignacio Fernandez-Villamor, Jose [1 ]
Blasco-Garcia, Jacobo [1 ]
Iglesias, Carlos A. [1 ]
Garijo, Mercedes [1 ]
机构
[1] Univ Politecn Madrid, Dept Ingn Sistemas Telemat, Madrid, Spain
关键词
Information extraction; Linked data; Screen scraping;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies.
引用
收藏
页码:451 / 456
页数:6
相关论文
共 50 条
  • [41] Web Scraping: State-of-the-Art and Areas of Application
    Diouf, Rabiyatou
    Sarr, Edouard Ngor
    Sall, Ousmane
    Birregah, Babiga
    Bousso, Mamadou
    Mbaye, Seny Ndiaye
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6040 - 6042
  • [42] A web scraping app for smart literature search of the keywords
    NTT DATA Business Solutions, Istanbul, Turkey
    不详
    PeerJ Comput. Sci., 2024,
  • [43] A Reference Paper Collection System Using Web Scraping
    Naing, Inzali
    Aung, Soe Thandar
    Wai, Khaing Hsu
    Funabiki, Nobuo
    ELECTRONICS, 2024, 13 (14)
  • [44] A web scraping app for smart literature search of the keywords
    Mutlu, Muhammed Ali
    Ulku, Eyup Emre
    Yildiz, Kazim
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [45] Web Scraping for Hospitality Research: Overview, Opportunities, and Implications
    Han, Saram
    Anderson, Christopher K.
    CORNELL HOSPITALITY QUARTERLY, 2021, 62 (01) : 89 - 104
  • [46] Research Note: Scraping Financial Data from the Web Using the R Language
    Krotov, Vlad
    Tennyson, Matthew
    JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2018, 15 (01) : 169 - 181
  • [47] WEB DATA SCRAPING TECHNIQUE AND PREPARATION FOR COMPARISON TECHNIQUES BETWEEN DIFFERENT DOCUMENTS
    Januzaj, Ylber
    Luma, Artan
    Aliu, Azir
    Selimi, Besnik
    Raufi, Bujar
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2019, 11 (02): : 71 - 86
  • [48] Web Scraping in R: An Approach to Social Science Research
    de Freitas Rodrigues, Quemuel Baruque
    Pequeno dos Santos Silva, Mayres Lane
    de Melo, Marina Felix
    Oliveira, Amurabi
    SIMBIOTICA, 2021, 8 (04): : 191 - 215
  • [49] RCrawler: An R package for parallel web crawling and scraping
    Khalil, Salim
    Fakir, Mohamed
    SOFTWAREX, 2017, 6 : 98 - 106
  • [50] Analyzing network of organ sales and trafficking using web scraping data.
    Wilson, Brian
    Koizumi, Naoru
    Patel, Amit
    Fraser, Campbell
    Siddique, Abu Bakkar
    TRANSPLANTATION, 2019, 103 (11) : S59 - S59