A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

被引:5
作者
Cauteruccio, Francesco [1 ]
Lo Giudice, Paolo [2 ]
Musarella, Lorenzo [2 ]
Terracina, Giorgio [1 ]
Ursino, Domenico [3 ]
Virgili, Luca [3 ]
机构
[1] Univ Calabria, Dipartimento Matemat & Informat, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Mediterranea Reggio Calabria, Dipartimento Ingn Informaz Infrastrutture & Energ, Via Univ,25 Gia Salita Melissari, I-89124 Reggio Di Calabria, CF, Italy
[3] Univ Politecn Marche, Dipartimento Ingn Informaz, Via Brecce Bianche 12, I-60131 Ancona, Italy
关键词
Unstructured sources; interschema property derivation; structuring unstructured data; big data; METADATA QUALITY; DIGITAL REPOSITORIES; SIMILARITY; CLASSIFICATION; CONSTRUCTION; INTEGRATION; SYSTEM; MODEL; DIKE;
D O I
10.1142/S0219622020500182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.
引用
收藏
页码:849 / 889
页数:41
相关论文
共 68 条
[1]   Automatic metadata generation applications: A survey study [J].
Faculty of Art and Humanities, Department of Library and Information Science, King Abdulaziz University, P.O. Box 80200, Jeddah 21589, Saudi Arabia .
Int. J. Metadata Semant. Ontol., 2008, 4 (260-282) :260-282
[2]  
Aleksovski Z, 2006, LECT NOTES ARTIF INT, V4248, P182
[3]   Improving XML schema matching performance using Prufer sequences [J].
Algergawy, Alsayed ;
Schallehn, Eike ;
Saake, Gunter .
DATA & KNOWLEDGE ENGINEERING, 2009, 68 (08) :728-747
[4]  
Algur Siddu P., 2016, International Journal of Information Technology and Computer Science, V8, P69, DOI 10.5815/ijitcs.2016.02.09
[5]  
Alserafi A, 2016, INT CONF DAT MIN WOR, P178, DOI [10.1109/ICDMW.2016.0033, 10.1109/ICDMW.2016.87]
[6]  
[Anonymous], 1979, Information retrieval
[7]  
[Anonymous], P 7 INT C BIOINSP CO
[8]  
[Anonymous], 2006, Data mining concepts and techniques
[9]  
Baeza-Yates Ricardo A., 1999, Modern Information Retrieval
[10]   Semantic integration of heterogeneous information sources [J].
Bergamaschi, S ;
Castano, S ;
Vincini, M ;
Beneventano, D .
DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) :215-249