Contextual Data Cleaning with Ontology Functional Dependencies

被引:2
|
作者
Zheng, Zheng [1 ]
Zheng, Longtao [2 ]
Alipourlangouri, Morteza [1 ]
Chiang, Fei [1 ]
Golab, Lukasz [3 ]
Szlichta, Jaroslaw [4 ]
Baskaran, Sridevi [1 ]
机构
[1] McMaster Univ, 1280 Main St, West Hamilton, ON L8S 4K1, Canada
[2] Univ Sci & Technol China, 96 JinZhai Rd, Hefei 230026, Anhui, Peoples R China
[3] Univ Waterloo, 200 Univ Ave W, Waterloo, ON N2L 3G1, Canada
[4] Ontario Tech Univ, 2000 Simcoe St N, Oshawa, ON L1G 0C, Canada
来源
关键词
Data cleaning; ontology functional dependencies; EFFICIENT DISCOVERY; MODEL;
D O I
10.1145/3524303
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional Dependencies define attribute relationships based on syntactic equality, and when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms defined by an ontology. We study the theoretical foundations of OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Toward enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional Functional Dependencies.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Elaboration on functional dependencies: Functional dependencies are dead, long live functional dependencies!
    Karachalias G.
    Schrijvers T.
    ACM SIGPLAN Not., 10 (133-147): : 133 - 147
  • [42] Elaboration on Functional Dependencies: Functional Dependencies Are Dead, Long Live Functional Dependencies!
    Karachalias, Georgios
    Schrijvers, Tom
    ACM SIGPLAN NOTICES, 2017, 52 (10) : 133 - 147
  • [43] A consistency cleaning method based on content-related conditional functional dependencies
    Du, Yue-Feng (dr.duyuefeng@gmail.com), 1683, Northeast University (37):
  • [44] Foundational challenges in automated Semantic Web data and ontology cleaning
    Alonso-Jiménez, JA
    Borrego-Díaz, J
    Chávez-González, AM
    Martín-Mateos, FJ
    IEEE INTELLIGENT SYSTEMS, 2006, 21 (01) : 42 - 52
  • [45] Evaluating ontology cleaning
    Welty, C
    Mahindru, R
    Chu-Carroll, J
    PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 311 - 316
  • [46] Preserving logical and functional dependencies in synthetic tabular data
    Umesh, Chaithra
    Schultz, Kristian
    Mahendra, Manjunath
    Bej, Saptarshi
    Wolkenhauer, Olaf
    PATTERN RECOGNITION, 2025, 163
  • [47] Contextual dependencies in a stimulus equivalence paradigm
    Dibbets, P
    Maes, JHR
    Vossen, JMH
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION B-COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 2002, 55 (02): : 97 - 119
  • [48] Cardinality constraints and functional dependencies over possibilistic data
    Roblot, Tania
    Link, Sebastian
    DATA & KNOWLEDGE ENGINEERING, 2018, 117 : 339 - 358
  • [49] Contextual Dependencies in Unsupervised Word Segmentation
    Goldwater, Sharon
    Griffiths, Thomas L.
    Johnson, Mark
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 673 - 680
  • [50] On the Existence of Armstrong Data Trees for XML Functional Dependencies
    Hartmann, Sven
    Koehler, Henning
    Trinh, Thu
    FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS, PROCEEDINGS, 2010, 5956 : 94 - +