Determining the Real Data Completeness of a Relational Dataset

被引:6
|
作者
Liu, Yong-Nan [1 ]
Li, Jian-Zhong [1 ]
Zou, Zhao-Nian [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Engn, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
data quality; data completeness; functional dependency; data completeness model; optimal algorithm;
D O I
10.1007/s11390-016-1659-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it. Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.
引用
收藏
页码:720 / 740
页数:21
相关论文
共 50 条
  • [31] SQL query to increase data accuracy and completeness in PATSTAT
    Pasimeni, Francesco
    WORLD PATENT INFORMATION, 2019, 57 : 1 - 7
  • [32] Evaluation of thyroid cancer data completeness and quality at a population-based cancer registry, Algeria
    Boukheris, Houda
    Brakni, Lila
    Boubezari, Reda Fihri
    Bettayeb, Arslan
    Bouaidjra, Noureddine Bachir
    Houari, Amina Bensetti
    Brahim, Farouk Mohamed
    Simerabet, Azeddine
    Achour, Zineb
    Attar, Sara
    Saim, Hafida
    Berber, Necib
    BULLETIN DU CANCER, 2023, 110 (09) : 873 - 882
  • [33] Determining the Currency of Data
    Fan, Wenfei
    Geerts, Floris
    Wijsen, Jef
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2012, 37 (04):
  • [34] Enrichment of OpenStreetMap Data Completeness with Sidewalk Geometries Using Data Mining Techniques
    Mobasheri, Amin
    Huang, Haosheng
    Degrossi, Livia Castro
    Zipf, Alexander
    SENSORS, 2018, 18 (02):
  • [35] Data completeness in the Finnish Intensive Care Quality Consortium database
    P Mussalo
    J Tenhunen
    Critical Care, 11 (Suppl 2):
  • [36] Enabling Fine-Grained RDF Data Completeness Assessment
    Darari, Fariz
    Razniewski, Simon
    Prasojo, Radityo Eko
    Nutt, Werner
    WEB ENGINEERING (ICWE 2016), 2016, 9671 : 170 - 187
  • [37] Reference Architectures to Measure Data Completeness across Integrated Databases
    Emran, Nurul A.
    Embury, Suzanne
    Missier, Paolo
    Ahmad, Norashikin
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2013), PT I,, 2013, 7802 : 216 - 225
  • [38] Evaluating the Quality of databases Instances based on Completeness and Accuracy of Data
    Al Khwlani, Mohammed N.
    Shmsan, Mariam
    Al-akhram, Nadia
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (01): : 35 - 38
  • [39] An interactive fitness-for-use data completeness tool to assess activity tracker data
    Cho, Sylvia
    Ensari, Ipek
    Elhadad, Noemie
    Weng, Chunhua
    Radin, Jennifer M.
    Bent, Brinnae
    Desai, Pooja
    Natarajan, Karthik
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2022, 29 (12) : 2032 - 2040
  • [40] Establishing a Multicentre Trauma Registry in India: An Evaluation of Data Completeness
    Shivasabesan, Gowri
    O'Reilly, Gerard M.
    Mathew, Joseph
    Fitzgerald, Mark C.
    Gupta, Amit
    Roy, Nobhojit
    Joshipura, Manjul
    Sharma, Naveen
    Cameron, Peter
    Fahey, Madonna
    Howard, Teresa
    Cheung, Zoe
    Kumar, Vineet
    Jarwani, Bhavesh
    Soni, Kapil Dev
    Patel, Pankaj
    Thakor, Advait
    Misra, Mahesh
    Gruen, Russell L.
    Mitra, Biswadev
    WORLD JOURNAL OF SURGERY, 2019, 43 (10) : 2426 - 2437