Continuous Data Quality Management for Machine Learning based Data-as-a-Service Architectures

被引:5
作者
Azimi, Shelernaz [1 ]
Pahl, Claus [1 ]
机构
[1] Free Univ Bozen Bolzano, Bolzano, Italy
来源
CLOSER: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE | 2021年
关键词
Data-as-a-Service; DaaS; Machine Learning; Data Quality; Edge Cloud; Internet-of-Things; Traffic Management; Case Study; CLASSIFICATION;
D O I
10.5220/0010509503280335
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data-as-a-Service (DaaS) solutions make raw source data accessible in the form of processable information. Machine learning (ML) allows to produce meaningful information and knowledge based on raw data. Thus, quality is a major concern that applies to raw data as well as to information provided by ML-generated models. At the core of the solution is a conceptual framework that links input data quality and the machine learned data service quality, specifically inferring raw data problems as root causes from observed data service deficiency symptoms. This allows to deduce the hidden origins of quality problems observable by users of DaaS offerings. We analyse the quality framework through an extensive case study from an edge cloud and Internet-of-Thingsbased traffic application. We determine quality assessment mechanisms for symptom and cause analysis in different quality dimensions.
引用
收藏
页码:328 / 335
页数:8
相关论文
共 28 条
[1]   Software Engineering for Machine Learning: A Case Study [J].
Amershi, Saleema ;
Begel, Andrew ;
Bird, Christian ;
DeLine, Robert ;
Gall, Harald ;
Kamar, Ece ;
Nagappan, Nachiappan ;
Nushi, Besmira ;
Zimmermann, Thomas .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, :291-300
[2]  
Azimi S., 2020, 22 INT C ENT INF SYS
[3]  
Caruana R., 2006, P 23 INT C MACH LEAR
[4]   Blockchain framework for IoT data quality via edge computing [J].
Casado-Vara, Roberto ;
de la Prieta, Fernando ;
Prieto, Javier ;
Corchado, Juan M. .
BLOCKSYS'18: PROCEEDINGS OF THE 1ST BLOCKCHAIN-ENABLED NETWORKED SENSOR SYSTEMS, 2018, :19-24
[5]  
De Hoog J., 2019, CEUR WORKSHOP PROC, P2491
[6]  
Deja K, 2019, P SCI, P350
[7]   A DaQL to Monitor Data Quality in Machine Learning Applications [J].
Ehrlinger, Lisa ;
Haunschmid, Verena ;
Palazzini, Davide ;
Lettner, Christian .
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, 2019, 11706 :227-237
[8]   An agility-oriented and fuzziness-embedded semantic model for collaborative cloud service search, retrieval and recommendation [J].
Fang, Daren ;
Liu, Xiaodong ;
Romdhani, Imed ;
Jamshidi, Pooyan ;
Pahl, Claus .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 :11-26
[9]   A Classification and Comparison Framework for Cloud Service Brokerage Architectures [J].
Fowley, Frank ;
Pahl, Claus ;
Jamshidi, Pooyan ;
Fang, Daren ;
Liu, Xiaodong .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (02) :358-371
[10]   Ontology Change Management and Identification of Change Patterns [J].
Javed, Muhammad ;
Abgaz, Yalemisew M. ;
Pahl, Claus .
JOURNAL ON DATA SEMANTICS, 2013, 2 (2-3) :119-143