A SEMANTIC SCRAPING MODEL FOR WEB RESOURCES Applying Linked Data to Web Page Screen Scraping

被引：0

作者：

Ignacio Fernandez-Villamor, Jose ^{[1
]}

Blasco-Garcia, Jacobo ^{[1
]}

Iglesias, Carlos A. ^{[1
]}

Garijo, Mercedes ^{[1
]}

机构：

[1] Univ Politecn Madrid, Dept Ingn Sistemas Telemat, Madrid, Spain

来源：

ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2 | 2011年

关键词：

Information extraction; Linked data; Screen scraping;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies.

引用

页码：451 / 456

页数：6

共 50 条

[41] Web Scraping: State-of-the-Art and Areas of Application
Diouf, Rabiyatou
Sarr, Edouard Ngor
Sall, Ousmane
Birregah, Babiga
Bousso, Mamadou
Mbaye, Seny Ndiaye
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6040 - 6042
[42] A web scraping app for smart literature search of the keywords
NTT DATA Business Solutions, Istanbul, Turkey
不详
PeerJ Comput. Sci., 2024,
[43] A Reference Paper Collection System Using Web Scraping
Naing, Inzali
Aung, Soe Thandar
Wai, Khaing Hsu
Funabiki, Nobuo
ELECTRONICS, 2024, 13 (14)
[44] A web scraping app for smart literature search of the keywords
Mutlu, Muhammed Ali
Ulku, Eyup Emre
Yildiz, Kazim
PEERJ COMPUTER SCIENCE, 2024, 10
[45] Web Scraping for Hospitality Research: Overview, Opportunities, and Implications
Han, Saram
Anderson, Christopher K.
CORNELL HOSPITALITY QUARTERLY, 2021, 62 (01) : 89 - 104
[46] Research Note: Scraping Financial Data from the Web Using the R Language
Krotov, Vlad
Tennyson, Matthew
JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2018, 15 (01) : 169 - 181
[47] WEB DATA SCRAPING TECHNIQUE AND PREPARATION FOR COMPARISON TECHNIQUES BETWEEN DIFFERENT DOCUMENTS
Januzaj, Ylber
Luma, Artan
Aliu, Azir
Selimi, Besnik
Raufi, Bujar
INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2019, 11 (02): : 71 - 86
[48] Web Scraping in R: An Approach to Social Science Research
de Freitas Rodrigues, Quemuel Baruque
Pequeno dos Santos Silva, Mayres Lane
de Melo, Marina Felix
Oliveira, Amurabi
SIMBIOTICA, 2021, 8 (04): : 191 - 215
[49] RCrawler: An R package for parallel web crawling and scraping
Khalil, Salim
Fakir, Mohamed
SOFTWAREX, 2017, 6 : 98 - 106
[50] Analyzing network of organ sales and trafficking using web scraping data.
Wilson, Brian
Koizumi, Naoru
Patel, Amit
Fraser, Campbell
Siddique, Abu Bakkar
TRANSPLANTATION, 2019, 103 (11) : S59 - S59

← 1 2 3 4 5 →