Optimising data quality of a data warehouse using data purgation process

被引:1
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [31] Data Warehouse for Quality Management Systems
    慕春棣
    戴剑彬
    TsinghuaScienceandTechnology, 1998, (03) : 83 - 86
  • [32] Statistical quality control of warehouse data
    Hinrichs, H
    DATABASES AND INFORMATION SYSTEMS, 2001, : 69 - 84
  • [33] An efficient hybrid optimization of ETL process in data warehouse of cloud architecture
    Dinesh, Lina
    Devi, K. Gayathri
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2024, 13 (01):
  • [34] Data Warehouse Design for Big Data in Academia
    Rudniy, Alex
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 979 - 992
  • [35] Big Data Augmentation with Data Warehouse: A Survey
    Aftab, Umar
    Siddiqui, Ghazanfar Farooq
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2785 - 2794
  • [36] Data Integration Patterns for Data Warehouse Automation
    Tomingas, Kalle
    Kliimask, Margus
    Tammet, Tanel
    NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS II, 2015, 312 : 41 - 55
  • [37] Big Data Augmentation with Data Warehouse: A Survey
    Aftab, Umar
    Siddiqui, Ghazanfar Farooq
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2775 - 2784
  • [38] A new process for healthcare big data warehouse integration
    Arfaoui, Nouha
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2023, 15 (03) : 240 - 254
  • [39] Using a Data Warehouse System to Monitor and Analyze Student Achievement in Teaching Process
    Gladic, Dejana
    Petrovacki, Jelena
    2021 20TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [40] Importance of Relationship Quality in the Success of Data Warehouse Systems
    Almabhouh, Alaaeddin
    Saleh, Abdul Razak
    Ahmad, Azizah
    INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2011), 2011, 8285