Determining the Real Data Completeness of a Relational Dataset

被引:6
|
作者
Liu, Yong-Nan [1 ]
Li, Jian-Zhong [1 ]
Zou, Zhao-Nian [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Engn, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
data quality; data completeness; functional dependency; data completeness model; optimal algorithm;
D O I
10.1007/s11390-016-1659-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it. Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.
引用
收藏
页码:720 / 740
页数:21
相关论文
共 50 条
  • [1] Determining the Real Data Completeness of a Relational Dataset
    Yong-Nan Liu
    Jian-Zhong Li
    Zhao-Nian Zou
    Journal of Computer Science and Technology, 2016, 31 : 720 - 740
  • [2] Determining change points in data completeness for the Holocene eruption record
    Mead, Stuart
    Magill, Christina
    BULLETIN OF VOLCANOLOGY, 2014, 76 (11)
  • [3] Determining change points in data completeness for the Holocene eruption record
    Stuart Mead
    Christina Magill
    Bulletin of Volcanology, 2014, 76
  • [4] Assessing real-world medication data completeness
    Evans, Laura
    London, Jack W.
    Palchuk, Matvey B.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 119
  • [5] A New Paradigm to Analyze Data Completeness of Patient Data
    Nasir, Ayan
    Gurupur, Varadraj
    Liu, Xinliang
    APPLIED CLINICAL INFORMATICS, 2016, 7 (03): : 745 - 764
  • [6] Can administrative data be used to research health visiting in England? A completeness assessment of the Community Services Dataset
    Clery, Amanda
    Bunting, Catherine
    Liu, Mengyun
    Harron, Katie
    Woodman, Jenny
    Mc Grath-Lone, Louise
    INTERNATIONAL JOURNAL OF POPULATION DATA SCIENCE (IJPDS), 2024, 9 (01):
  • [7] Data Completeness Measures
    Emran, Nurul A.
    PATTERN ANALYSIS, INTELLIGENT SECURITY AND THE INTERNET OF THINGS, 2015, 355 : 117 - 130
  • [8] Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements
    Darari, Fariz
    Nutt, Werner
    Razniewski, Simon
    Rudolph, Sebastian
    SEMANTIC WEB, 2020, 11 (03) : 441 - 482
  • [9] A federated EHR network data completeness tracking system
    Estiri, Hossein
    Klann, Jeffrey G.
    Weiler, Sarah R.
    Alema-Mensah, Ernest
    Applegate, R. Joseph
    Lozinski, Galina
    Patibandla, Nandan
    Wei, Kun
    Adams, William G.
    Natter, Marc D.
    Ofili, Elizabeth O.
    Ostasiewski, Brian
    Quarshie, Alexander
    Rosenthal, Gary E.
    Bernstam, Elmer V.
    Mandl, Kenneth D.
    Murphy, Shawn N.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (07) : 637 - 645
  • [10] An empirical study of the antecedents of data completeness in electronic medical records
    Liu, Caihua
    Zowghi, Didar
    Talaei-Khoei, Amir
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2020, 50 : 155 - 170