Personalised Exploration Graphs on Semantic Data Lakes

被引:12
作者
Bagozi, Ada [1 ]
Bianchini, Devis [1 ]
De Antonellis, Valeria [1 ]
Garda, Massimiliano [1 ]
Melchiori, Michele [1 ]
机构
[1] Univ Brescia, Dept Informat Engn, Via Branze 38, I-25123 Brescia, Italy
来源
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2019 CONFERENCES | 2019年 / 11877卷
关键词
Semantic data lake; Data exploration; Smart City;
D O I
10.1007/978-3-030-33246-4_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, organisations operating in the context of Smart Cities are spending time and resources in turning large amounts of data, collected within heterogeneous sources, into actionable insights, using indicators as powerful tools for meaningful data aggregation and exploration. Data lakes, which follow a schema-on-read approach, allow for storing both structured and unstructured data and have been proposed as flexible repositories for enabling data exploration and analysis over heterogeneous data sources, regardless their structure. However, indicators are usually computed based on the centralisation of the data storage, according to a less flexible schema on write approach. Furthermore, domain experts, who know data stored within the data lake, are usually distinct from data analysts, who define indicators, and users, who exploit indicators to explore data in a personalised way. In this paper, we propose a semantics-based approach for enabling personalised data lake exploration through the conceptualisation of proper indicators. In particular, the approach is structured as follows: (i) at the bottom, heterogeneous data sources within a data lake are enriched with Semantic Models, defined by domain experts using domain ontologies, to provide a semantic data lake representation; (ii) in the middle, aMulti-Dimensional Ontology is used by analysts to define indicators and analysis dimensions, in terms of concepts within Semantic Models and formulas to aggregate them; (iii) at the top, Personalised Exploration Graphs are generated for different categories of users, whose profiles are defined in terms of a set of constraints that limit the indicators instances on which the users may rely to explore data. Benefits and limitations of the approach are discussed through an application in the Smart City domain.
引用
收藏
页码:22 / 39
页数:18
相关论文
共 17 条
[1]   Using Semantic Web Technologies for Exploratory OLAP: A Survey [J].
Abello, Alberto ;
Romero, Oscar ;
Pedersen, Torben Bach ;
Berlanga, Rafael ;
Nebot, Victoria ;
Jose Aramburu, Maria ;
Simitsis, Alkis .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (02) :571-588
[2]  
Alrehamy H, 2015, PROCEEDINGS 2015 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING BDCLOUD 2015, P160, DOI 10.1109/BDCloud.2015.62
[3]  
Alserafi A, 2016, INT CONF DAT MIN WOR, P178, DOI [10.1109/ICDMW.2016.0033, 10.1109/ICDMW.2016.87]
[4]   CoreKG: a Knowledge Lake Service [J].
Beheshti, Amin ;
Benatallah, Boualem ;
Nouri, Reza ;
Tabebordbar, Alireza .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12) :1942-1945
[5]   Database Challenges for Exploratory Computing [J].
Buoncristiano, Marcello ;
Mecca, Giansalvatore ;
Quintarelli, Elisa ;
Roveri, Manuel ;
Santoro, Donatello ;
Tanca, Letizia .
SIGMOD RECORD, 2015, 44 (02) :17-22
[6]  
Chauhan Sumedha, 2016, Info, V18, P73, DOI 10.1108/info-03-2016-0012
[7]  
Diamantini C, 2014, LECT NOTES COMPUT SC, V8841, P727, DOI 10.1007/978-3-662-45563-0_45
[8]   Constance: An Intelligent Data Lake System [J].
Hai, Rihan ;
Geisler, Sandra ;
Quix, Christoph .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :2097-2100
[9]  
Halevy A.Y., 2016, IEEE Data Eng. Bull, V39, P5
[10]   Semantic Data Management for Experimental Manufacturing Technologies [J].
Kasrin, Nasr ;
Qureshi, Maliha ;
Steuer, Simon ;
Nicklas, Daniela .
Datenbank-Spektrum, 2018, 18 (01) :27-37