A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

被引:5
作者
Cauteruccio, Francesco [1 ]
Lo Giudice, Paolo [2 ]
Musarella, Lorenzo [2 ]
Terracina, Giorgio [1 ]
Ursino, Domenico [3 ]
Virgili, Luca [3 ]
机构
[1] Univ Calabria, Dipartimento Matemat & Informat, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Mediterranea Reggio Calabria, Dipartimento Ingn Informaz Infrastrutture & Energ, Via Univ,25 Gia Salita Melissari, I-89124 Reggio Di Calabria, CF, Italy
[3] Univ Politecn Marche, Dipartimento Ingn Informaz, Via Brecce Bianche 12, I-60131 Ancona, Italy
关键词
Unstructured sources; interschema property derivation; structuring unstructured data; big data; METADATA QUALITY; DIGITAL REPOSITORIES; SIMILARITY; CLASSIFICATION; CONSTRUCTION; INTEGRATION; SYSTEM; MODEL; DIKE;
D O I
10.1142/S0219622020500182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.
引用
收藏
页码:849 / 889
页数:41
相关论文
共 68 条
[61]  
Le Q, 2014, PR MACH LEARN RES, V32, P1188
[62]   A survey of approaches to automatic schema matching [J].
Rahm, E ;
Bernstein, PA .
VLDB JOURNAL, 2001, 10 (04) :334-350
[63]  
Rose S., 2010, Text mining: applications and theory, P1, DOI DOI 10.1002/9780470689646.CH1
[64]  
Sahlgren M., 2005, Natural Language Engineering, V11, P327, DOI 10.1017/S1351324905003876
[65]  
Sahlgren M., 2004, P 20 INT C COMPUTATI, P487, DOI [10.3115/ 1220355.1220425, 10.3115/1220355.1220425]
[66]   Comparative Analysis of Text Representation Methods Using Classification [J].
Szymanski, Julian .
CYBERNETICS AND SYSTEMS, 2014, 45 (02) :180-199
[67]   Dealing with metadata quality: The legacy of digital library efforts [J].
Tani, Alice ;
Candela, Leonardo ;
Castelli, Donatella .
INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (06) :1194-1205
[68]  
Wang F., 2014, ACM INT C, P1069, DOI DOI 10.1145/2661829.2662067