TODQA: Efficient Task-Oriented Data Quality Assessment

被引:9
作者
Li, Anran [1 ]
Zhang, Lan [1 ]
Qian, Jianwen [2 ]
Xiao, Xiang [1 ]
Li, Xiang-Yang [1 ]
Xie, Yunting [1 ]
机构
[1] Univ Sci & Technol China, Dept Comp Sci & Technol, Hefei, Peoples R China
[2] IIT, Dept Comp Sci & Technol, Chicago, IL 60616 USA
来源
2019 15TH INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SENSOR NETWORKS (MSN 2019) | 2019年
基金
国家重点研发计划;
关键词
Data Quality Assessment; Sampling; Locality Sensitive Hashing; Rank Aggregation;
D O I
10.1109/MSN48538.2019.00028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data quality assessment is vital for many information services ranging from sensor networks to smart city systems. The current data quality assessments, however, are often derived from intrinsic data characteristics, disconnected from specific application contexts, or arc not applicable or efficient for large datasets. In this work, we propose a novel task -oriented data quality assessment framework, which balances between the intrinsic and contextual quality. We carefully craft the assessment metrics, quantify them, and fuse them to rank candidate datasets by quality given specific tasks. To improve the system efficiency, two fast calculation algorithms are designed to quantify the relationship between datasets and the task, and the distribution of data items. We conduct extensive evaluations on six public image datasets (with 460, 2/17 images in total) and four text document datasets (with 37, 372 documents in total) to evaluate the efficacy and efficiency of our design. Experimental results show that our algorithms can save about 90% computing time with little accuracy loss which validates the feasibility and effectiveness of our framework for large datasets.
引用
收藏
页码:81 / 88
页数:8
相关论文
共 28 条
  • [1] [Anonymous], 2017, IEEE ICC
  • [2] [Anonymous], 1999, P EUROPEAN C INFORM
  • [3] [Anonymous], 2013, P 2013 C EMPIRICAL M
  • [4] Ballou D., 1998, MANAGEMENT SCI
  • [5] Optimizing positional scoring rules for rank aggregation
    Caragiannis, Loannis
    Chatzigeorgiou, Xenophon
    Krimpas, George A.
    Voudouris, Alexandros A.
    [J]. ARTIFICIAL INTELLIGENCE, 2019, 267 : 58 - 77
  • [6] Charikar Moses S, 2002, P 34 ANN ACM S THEOR, P380
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] DuBay W. H., 2007, Smart language: Readers, readability, and the grading of text
  • [9] Everingham M., 2012, The pascal visual object classes challenge 2012 (voc2012) results
  • [10] Hu Minqing., 2004, ACM SIGKDD