A SEMANTIC SCRAPING MODEL FOR WEB RESOURCES Applying Linked Data to Web Page Screen Scraping

被引:0
|
作者
Ignacio Fernandez-Villamor, Jose [1 ]
Blasco-Garcia, Jacobo [1 ]
Iglesias, Carlos A. [1 ]
Garijo, Mercedes [1 ]
机构
[1] Univ Politecn Madrid, Dept Ingn Sistemas Telemat, Madrid, Spain
关键词
Information extraction; Linked data; Screen scraping;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies.
引用
收藏
页码:451 / 456
页数:6
相关论文
共 50 条
  • [31] WEB SCRAPING AS A METHOD OF DATA EXTRACTION IN SOCIOLOGICAL STUDIES: ON SCIENTIFIC APPLICABILITY
    Vilkova, Olga V.
    VESTNIK TOMSKOGO GOSUDARSTVENNOGO UNIVERSITETA-FILOSOFIYA-SOTSIOLOGIYA-POLITOLOGIYA-TOMSK STATE UNIVERSITY JOURNAL OF PHILOSOPHY SOCIOLOGY AND POLITICAL SCIENCE, 2020, 54 : 163 - 175
  • [32] Phishing Website Detection Framework Through Web Scraping and Data Mining
    Park, Andrew J.
    Quadari, Ruhi Naaz
    Tsang, Herbert H.
    2017 8TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2017, : 680 - 684
  • [33] ProCircle: A promotion platform using crowdsourcing and web data scraping technique
    Junjoewong, Lalita
    Sangnapachai, Supatsara
    Sunetnanta, Thanwadee
    2018 SEVENTH ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2018, : 171 - 175
  • [34] Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction
    E. Suganya
    S. Vijayarani
    Wireless Personal Communications, 2021, 118 : 1481 - 1505
  • [35] Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction
    Suganya, E.
    Vijayarani, S.
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 118 (02) : 1481 - 1505
  • [36] Towards End-User Web Scraping for Customization
    Katongo, Kapaya
    Litt, Geoffrey
    Jackson, Daniel
    COMPANION PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING (PROGRAMMING 2021 COMPANION), 2021, : 49 - 59
  • [37] Design and analyses of web scraping on burstable virtual machines
    Drummond, Lucia Maria A.
    Andrade, Luciano
    Muniz, Pedro de Brito
    Pereira, Matheus Marotti
    Silva, Thiago do Prado
    Teylo, Luan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (09):
  • [38] Flat rent price prediction in Berlin with web scraping
    Camilo Meyberg
    Ulrich Rendtel
    Holger Leerhoff
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2024, 18 (2) : 245 - 278
  • [39] An industrial perspective on web scraping characteristics and open issues
    Chiapponi, Elisa
    Dacier, Marc
    Thonnard, Olivier
    Fangar, Mohamed
    Mattsson, Mattias
    Rigal, Vincent
    52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS SUPPLEMENTAL VOLUME (DSN-S 2022), 2022, : 5 - 8
  • [40] Cleaner Pretraining Corpus Curation with Neural Web Scraping
    Xu, Zhipeng
    Liu, Zhenghao
    Yan, Yukun
    Liu, Zhiyuan
    Yu, Ge
    Xiong, Chenyan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 802 - 812