Optimising data quality of a data warehouse using data purgation process

被引:1
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [21] Impact of Artificial Intelligence on the Generation Process of the Data Warehouse Model
    Sen, Ibtissam Arras
    Laaroussi, Khadija
    Rabhi, Ouzayr
    Erramdani, Mohammed
    Hassas, Mohammed
    ADVANCES IN SMART MEDICAL, IOT & ARTIFICIAL INTELLIGENCE, VOL 1, ICSMAI 2024, 2024, 11 : 59 - 67
  • [22] On the Research of Data Warehouse in Big Data
    Qin, Hai-fei
    Qian, Zhi-ming
    Zhao, Yong-chao
    2015 INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC), 2015, : 354 - 357
  • [23] Evaluation of data warehouse quality from conceptual model perspective
    20150400456327
    Sharma, Rakhee, 2015, Springer Verlag (320): : 521 - 534
  • [24] Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach
    Qi Liu
    Gengzhong Feng
    Giri Kumar Tayi
    Jun Tian
    Information Systems Frontiers, 2021, 23 : 375 - 389
  • [25] Using data warehouse for the decisional process of a sustainable firm
    Stefanescu, Laura
    Ungureanu, Laura
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2006, 1 : 449 - 452
  • [26] A Framework for Improving Data Quality in Data Warehouse: A Case Study
    Ali, Taghrid Z.
    Abdelaziz, Tawfig M.
    Maatuk, Abdelsalam M.
    Elakeili, Salwa M.
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [27] Taxonomy of data quality problems in multidimensional Data Warehouse models
    de Almeida, Wesley Gongora
    de Sousa, Rafael Timoteo, Jr.
    de Deus, Flavio Elias
    Amvame Nze, Georges Daniel
    Lopes de Mendonca, Fabio Lucio
    PROCEEDINGS OF THE 2013 8TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2013), 2013,
  • [28] Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial
    Roelofs, Erik
    Persoon, Lucas
    Nijsten, Sebastiaan
    Wiessler, Wolfgang
    Dekker, Andre
    Lambin, Philippe
    RADIOTHERAPY AND ONCOLOGY, 2013, 108 (01) : 174 - 179
  • [29] Data warehouse technology in process industry
    Wang, YS
    Shao, HH
    PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 2037 - 2041
  • [30] The impact of a data warehouse on the survey process
    Yost, M
    ASC 2003: THE IMPACT OF TECHNOLOGY ON THE SURVEY PROCESS, 2003, : 405 - 412