Optimising data quality of a data warehouse using data purgation process

被引:1
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [21] Achieving Data Warehouse Quality Using GDI Approach
    Gosain, Anjana
    Singh, Jaspreeti
    2008 FIRST INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES, VOLS 1 AND 2, 2008, : 501 - 506
  • [22] Using Ontologies as Context for Data Warehouse Quality Assessment
    Sanz, Camila
    Marotta, Adriana
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2023, 2023, 14148 : 3 - 17
  • [24] Literature Review of Data model Quality metrics of Data Warehouse
    Gosain, Anjana
    Heena
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 236 - 243
  • [25] A Measurement Model for Factors Influencing Data Quality in Data Warehouse
    Zellal, Nouha
    Zaouia, Abdellah
    2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 46 - 51
  • [26] An Exploratory Investigation of Factors Influencing Data Quality in Data Warehouse
    Zellal, Nouha
    Zaouia, Abdellah
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [27] A Framework for Improving Data Quality in Data Warehouse: A Case Study
    Ali, Taghrid Z.
    Abdelaziz, Tawfig M.
    Maatuk, Abdelsalam M.
    Elakeili, Salwa M.
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [28] Taxonomy of data quality problems in multidimensional Data Warehouse models
    de Almeida, Wesley Gongora
    de Sousa, Rafael Timoteo, Jr.
    de Deus, Flavio Elias
    Amvame Nze, Georges Daniel
    Lopes de Mendonca, Fabio Lucio
    PROCEEDINGS OF THE 2013 8TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2013), 2013,
  • [29] The impact of a data warehouse on the survey process
    Yost, M
    ASC 2003: THE IMPACT OF TECHNOLOGY ON THE SURVEY PROCESS, 2003, : 405 - 412
  • [30] Data warehouse technology in process industry
    Wang, YS
    Shao, HH
    PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 2037 - 2041