Contextual Data Cleaning with Ontology Functional Dependencies

被引:0
|
作者
Zheng, Zheng [1 ]
Zheng, Longtao [2 ]
Alipourlangouri, Morteza [1 ]
Chiang, Fei [1 ]
Golab, Lukasz [3 ]
Szlichta, Jaroslaw [4 ]
Baskaran, Sridevi [1 ]
机构
[1] McMaster University, 1280 Main Street, Hamilton,ON,L8S 4K1, Canada
[2] University of Science and Technology of China, No. 96, JinZhai Road Baohe District, Anhui, Hefei,230026, China
[3] University of Waterloo, 200 University Ave W, Waterloo,ON,N2L 3G1, Canada
[4] Ontario Tech University, 2000 Simcoe St N, Oshawa,ON,L1G 0C, Canada
关键词
Cleaning; -; Semantics;
D O I
暂无
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Functional Dependencies define attribute relationships based on syntactic equality, and when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms defined by an ontology. We study the theoretical foundations of OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Toward enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional Functional Dependencies. © 2022 Association for Computing Machinery.
引用
收藏
相关论文
共 50 条
  • [1] Contextual Data Cleaning with Ontology Functional Dependencies
    Zheng, Zheng
    Zheng, Longtao
    Alipourlangouri, Morteza
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    Baskaran, Sridevi
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2022, 14 (03):
  • [2] Contextual Data Cleaning with Ontology FDs
    Zheng, Zheng
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2911 - 2913
  • [3] Pattern Functional Dependencies for Data Cleaning
    Qahtan, Abdulhakim
    Tang, Nan
    Ouzzani, Mourad
    Cao, Yang
    Stonebraker, Michael
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (05): : 684 - 697
  • [4] Conditional functional dependencies for data cleaning
    Bohannon, Philip
    Fan, Wenfei
    Geerts, Floris
    Jia, Xibei
    Kementsietsidis, Anastasios
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 721 - 730
  • [5] Contextual Data Cleaning
    Langouri, Morteza Alipour
    Zheng, Zheng
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 21 - 24
  • [6] Efficient Discovery of Ontology Functional Dependencies
    Baskaran, Sridevi
    Keller, Alexander
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1847 - 1856
  • [7] Extending dependencies with conditions for data cleaning
    Fan, Wenfei
    2008 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2008, : 185 - 190
  • [8] Data repair of density-based data cleaning approach using conditional functional dependencies
    Al-Janabi, Samir
    Janicki, Ryszard
    DATA TECHNOLOGIES AND APPLICATIONS, 2022, 56 (03) : 429 - 446
  • [9] Data Cleaning Utilizing Ontology Tool
    Wong, Jing Ting
    Hong, Jer Lang
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (07): : 43 - 52
  • [10] A NOVEL ONTOLOGY TOOL FOR DATA CLEANING
    Wong, Jing Ting
    Hong, Jer Lang
    UNCERTAINTY MODELLING IN KNOWLEDGE ENGINEERING AND DECISION MAKING, 2016, 10 : 384 - 390