Optimising data quality of a data warehouse using data purgation process

被引:1
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [1] Data Warehouse and Data Quality - An Overview
    Brajkovic, Helena
    Jaksic, Danijela
    Poscic, Patrizia
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2020), 2020, : 17 - 24
  • [2] Data Warehouse Quality Assessment Using Contexts
    Serra, Flavia
    Marotta, Adriana
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT II, 2016, 10042 : 436 - 448
  • [3] Research on Data Quality of Data Warehouse
    Liu Shuanghong
    Han Zhongjun
    EBM 2010: INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT, VOLS 1-8, 2010, : 5255 - 5258
  • [4] Data Quality in Data Warehouse Systems
    Serra, Flavia
    Marotta, Adriana
    PROCEEDINGS OF THE 2016 XLII LATIN AMERICAN COMPUTING CONFERENCE (CLEI), 2016,
  • [5] An Exploratory Investigation of Factors Influencing Data Quality in Data Warehouse
    Zellal, Nouha
    Zaouia, Abdellah
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [6] Cacophonic contributions to data quality in the data warehouse
    Rasmussen, Karsten Boye
    WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 7, 2005, : 311 - 316
  • [7] Research of data quality assurance about ETL of telecom data warehouse
    Wei, S., 1839, Asian Network for Scientific Information (12): : 1839 - 1844
  • [8] Stakeholder perceptions of data quality in a data warehouse environment
    Giannoccaro, A
    Shanks, G
    Darke, P
    AUSTRALIAN COMPUTER JOURNAL, 1999, 31 (04): : 110 - 117
  • [9] Knowledge Based Data Cleaning for Data Warehouse Quality
    Bradji, Louardi
    Boufaida, Mahmoud
    DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS, PT 2, 2011, 189 : 373 - +
  • [10] Using Ontologies as Context for Data Warehouse Quality Assessment
    Sanz, Camila
    Marotta, Adriana
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2023, 2023, 14148 : 3 - 17