A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

被引：5

作者：

Cauteruccio, Francesco ^{[1
]}

Lo Giudice, Paolo ^{[2
]}

Musarella, Lorenzo ^{[2
]}

Terracina, Giorgio ^{[1
]}

Ursino, Domenico ^{[3
]}

Virgili, Luca ^{[3
]}

机构：

[1] Univ Calabria, Dipartimento Matemat & Informat, I-87036 Arcavacata Di Rende, CS, Italy

[2] Univ Mediterranea Reggio Calabria, Dipartimento Ingn Informaz Infrastrutture & Energ, Via Univ,25 Gia Salita Melissari, I-89124 Reggio Di Calabria, CF, Italy

[3] Univ Politecn Marche, Dipartimento Ingn Informaz, Via Brecce Bianche 12, I-60131 Ancona, Italy

来源：

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING | 2020年 / 19卷 / 03期

关键词：

Unstructured sources; interschema property derivation; structuring unstructured data; big data; METADATA QUALITY; DIGITAL REPOSITORIES; SIMILARITY; CLASSIFICATION; CONSTRUCTION; INTEGRATION; SYSTEM; MODEL; DIKE;

D O I：

10.1142/S0219622020500182

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.

引用

页码：849 / 889

页数：41

共 68 条

[1] Automatic metadata generation applications: A survey study [J].

Faculty of Art and Humanities, Department of Library and Information Science, King Abdulaziz University, P.O. Box 80200, Jeddah 21589, Saudi Arabia .

Int. J. Metadata Semant. Ontol., 2008, 4 (260-282) :260-282

[2]

Aleksovski Z, 2006, LECT NOTES ARTIF INT, V4248, P182

[3] Improving XML schema matching performance using Prufer sequences [J].

Algergawy, Alsayed ;

Schallehn, Eike ;

Saake, Gunter .

DATA & KNOWLEDGE ENGINEERING, 2009, 68 (08) :728-747

[4]

Algur Siddu P., 2016, International Journal of Information Technology and Computer Science, V8, P69, DOI 10.5815/ijitcs.2016.02.09

[5]

Alserafi A, 2016, INT CONF DAT MIN WOR, P178, DOI [10.1109/ICDMW.2016.0033, 10.1109/ICDMW.2016.87]

[6]

[Anonymous], 1979, Information retrieval

[7]

[Anonymous], P 7 INT C BIOINSP CO

[8]

[Anonymous], 2006, Data mining concepts and techniques

[9]

Baeza-Yates Ricardo A., 1999, Modern Information Retrieval

[10] Semantic integration of heterogeneous information sources [J].

Bergamaschi, S ;

Castano, S ;

Vincini, M ;

Beneventano, D .

DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) :215-249

← 1 2 3 4 5 6 7 →