Optimising data quality of a data warehouse using data purgation process

被引:1
|
作者
Gupta, Neha [1 ]
机构
[1] Manav Rachna Int Inst Res & Studies, Fac Comp Applicat, Faridabad 121002, India
关键词
data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP; BIG DATA; PREDICTION; MANAGEMENT; IMPUTATION; FRAMEWORK; ETL;
D O I
10.1504/IJDMMM.2023.129961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
引用
收藏
页码:102 / 131
页数:31
相关论文
共 50 条
  • [41] Construction of Data Warehouse Platform in Continual Quality Improvement
    Tan, Jun
    Zhao, Haiming
    COMPUTER AND INFORMATION TECHNOLOGY, 2014, 519-520 : 13 - +
  • [42] Data Warehouse and Data Virtualization
    Mousa, Ayad Hameed
    Shiratuddin, Norshuhada
    PROCEEDINGS 2015 INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING DESE 2015, 2015, : 369 - 372
  • [43] From Data Warehouse to a New Trend in Data Architectures - Data Lake
    Zagan, Elisabeta
    Danubianu, Mirela
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (03): : 30 - 35
  • [44] A quality-aware spatial data warehouse for querying hydroecological data
    Berrahou, L.
    Lalande, N.
    Serrano, E.
    Molla, G.
    Berti-Equille, L.
    Bimonte, S.
    Bringay, S.
    Cernesson, F.
    Grac, C.
    Ienco, D.
    Le Ber, F.
    Teisseire, M.
    COMPUTERS & GEOSCIENCES, 2015, 85 : 126 - 135
  • [45] Incremental updates using Data Warehouse versus Data Marts
    Chakraborty, Sonali
    Doshi, Jyotika
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [46] A New Approach for Integrating Data into Big Data Warehouse
    Hilali, Intissar
    Arfaoui, Nouha
    Ejbali, Ridha
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [47] Data mapping diagrams for data warehouse design with UML
    Luján-Mora, S
    Vassiliadis, P
    Trujillo, J
    CONCEPTUAL MODELING - ER 2004, PROCEEDINGS, 2004, 3288 : 191 - 204
  • [48] Managing Evolution of Heterogeneous Data Sources of a Data Warehouse
    Solodovnikova, Darja
    Niedrite, Laila
    Svilpe, Lauma
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 105 - 117
  • [49] Designing and Implementing Data Warehouse for Agricultural Big Data
    Ngo, Vuong M.
    Le-Khac, Nhien-An
    Kechadi, M-Tahar
    BIG DATA - BIGDATA 2019, 2019, 11514 : 1 - 17
  • [50] Data Warehouse with Big Data Technology for Higher Education
    Santoso, Leo Willyanto
    Yulia
    4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 93 - 99