A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES

被引:0
|
作者
Peng, Taoxin [1 ]
机构
[1] Napier Univ, Sch Comp, Edinburgh EH10 5DT, Midlothian, Scotland
来源
ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL DISI: DATABASES AND INFORMATION SYSTEMS INTEGRATION | 2008年
关键词
Data Cleaning; Data Quality; Data Integration; Data Warehousing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.
引用
收藏
页码:473 / 478
页数:6
相关论文
共 50 条
  • [31] Data cleaning framework for highway asphalt pavement inspection data based on artificial neural networks
    Han, Chengjia
    Zhang, Weiguang
    Ma, Tao
    INTERNATIONAL JOURNAL OF PAVEMENT ENGINEERING, 2022, 23 (14) : 5198 - 5210
  • [32] Optimization of multidimensional aggregates in data warehouses
    Pears, Russel
    Houliston, Bryan
    JOURNAL OF DATABASE MANAGEMENT, 2007, 18 (01) : 69 - 93
  • [33] Enhancing Recall Using Data Cleaning for Biomedical Big Data
    Deshpande, Priya
    Rasin, Alexander
    Tchoua, Roselyne
    Furst, Jacob
    Raicu, Daniela A.
    Antani, Sameer
    2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 265 - 270
  • [34] Improving Data Cleaning by Learning From Unstructured Textual Data
    Nasfi, Rihem
    de Tre, Guy
    Bronselaer, Antoon
    IEEE ACCESS, 2025, 13 : 36470 - 36491
  • [35] A Framework for Data Quality in Data Warehousing
    Nemani, Rao R.
    Konda, Ramesh
    INFORMATION SYSTEMS: MODELING, DEVELOPMENT, AND INTEGRATION: THIRD INTERNATIONAL UNITED INFORMATION SYSTEMS CONFERENCE, UNISCON 2009, 2009, 20 : 292 - +
  • [36] Customer and household matching: resolving entity identity in data warehouses
    Berndt, DJ
    Satterfield, RK
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY II, 2000, 4057 : 173 - 180
  • [37] A Comparative Study of Data Cleaning Tools
    Oni, Samson
    Chen, Zhiyuan
    Hoban, Susan
    Jademi, Onimi
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2019, 15 (04) : 48 - 65
  • [38] IoT data cleaning techniques: A survey
    Ding X.
    Wang H.
    Li G.
    Li H.
    Li Y.
    Liu Y.
    Intelligent and Converged Networks, 2022, 3 (04): : 325 - 339
  • [39] An AI Planning System for Data Cleaning
    Boselli, Roberto
    Cesarini, Mirko
    Mercorio, Fabio
    Mezzanzanica, Mario
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT III, 2017, 10536 : 349 - 353
  • [40] Towards Reusing Data Cleaning Knowledge
    Almeida, Ricardo
    Maio, Paulo
    Oliveira, Paulo
    Joao, Barroso
    NEW CONTRIBUTIONS IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, PT 1, 2015, 353 : 143 - 150