An Approach to Extracting Topic-guided Views from the Sources of a Data Lake

被引:12
作者
Diannantini, Claudia [1 ]
Lo Giudice, Paolo [2 ]
Potena, Domenico [1 ]
Storti, Emanuele [1 ]
Ursino, Domenico [1 ]
机构
[1] Polytech Univ Marche, DII, Ancona, Italy
[2] Univ Mediterranea Reggio Calabria, DIIES, Reggio Di Calabria, Italy
关键词
Data lakes; Unstructuted data sources; Metadata management; Thematic views; Semantic similarities; DBpedia; LINKED DATA; INFORMATION; INTEGRATION; QUERIES; CONSTRUCTION; SYSTEM; DIKE;
D O I
10.1007/s10796-020-10010-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last years, data lakes are emerging as an effective and an efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. In this scenario, one of the most challenging issues to address consists in the extraction of topic-guided (i.e., thematic) views from the (very heterogeneous and often unstructured) sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, "structuring" unstructured data. Finally, we define a technique to extract topic-guided views from the sources of a data lake, based on similarity and other semantic relationships among source metadata.
引用
收藏
页码:243 / 262
页数:20
相关论文
共 57 条
  • [21] Fang H, 2015, IEEE ANN INT CONF CY, P820, DOI 10.1109/CYBER.2015.7288049
  • [22] CLAMS: Bringing Quality to Data Lakes
    Farid, Mina
    Roatis, Alexandra
    Ilyas, Ihab F.
    Hoffmann, Hella-Franziska
    Chu, Xu
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2089 - 2092
  • [23] Storing and analysing voice of the market data in the corporate data warehouse
    Garcia-Moya, Lisette
    Kudama, Shahad
    Jose Aramburu, Maria
    Berlanga, Rafael
    [J]. INFORMATION SYSTEMS FRONTIERS, 2013, 15 (03) : 331 - 349
  • [24] Constance: An Intelligent Data Lake System
    Hai, Rihan
    Geisler, Sandra
    Quix, Christoph
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2097 - 2100
  • [25] Answering queries using views: A survey
    Halevy, AY
    [J]. VLDB JOURNAL, 2001, 10 (04) : 270 - 294
  • [26] Hamadou H. B, 2018, 20 INT C ENTERPRISE, V1, P58
  • [27] Heath T., 2011, Linked Data: Evolving Web into a Global Data Space, DOI [10.2200/S00334ED1V01Y201102WBE001, DOI 10.2200/S00334ED1V01Y201102WBE001]
  • [28] HIRSCHMAN AO, 1964, AM ECON REV, V54, P761
  • [29] Linked Data, Big Data, and the 4th Paradigm
    Hitzler, Pascal
    Janowicz, Krzysztof
    [J]. SEMANTIC WEB, 2013, 4 (03) : 233 - 235
  • [30] Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making
    Janjua, Naeem Khalid
    Hussain, Farookh Khadeer
    Hussain, Omar Khadeer
    [J]. INFORMATION SYSTEMS FRONTIERS, 2013, 15 (02) : 167 - 192