Identifying Insuficient Data Coverage for Ordinal Continuous-Valued Atributes

被引:22
作者
Asudeh, Abolfazl [1 ]
Shahbazi, Nima [1 ]
Jin, Zhongjun [2 ]
Jagadish, H., V [2 ]
机构
[1] Univ Illinois, Chicago, IL 60680 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
来源
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2021年
关键词
Responsible Data Science; Trustworthy AI; Fairness in Machine Learning; Bias Detection; VORONOI DIAGRAMS; ALGORITHM;
D O I
10.1145/3448016.3457315
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Appropriate training data is a requirement for building good machine-learned models. In this paper, we study the notion of coverage for ordinal and continuous-valued attributes, by formalizing the intuition that the learned model can accurately predict only at data points for which there are "enough" similar data points in the training data set. We develop an efficient algorithm to identify uncovered regions in low-dimensional attribute feature space, by making a connection to Voronoi diagrams. We also develop a randomized approximation algorithm for use in high-dimensional attribute space. We evaluate our algorithms through extensive experiments on real datasets.
引用
收藏
页码:129 / 141
页数:13
相关论文
共 59 条
[1]   Constructing levels in arrangements and higher order Voronoi diagrams [J].
Agarwal, PK ;
de Berg, M ;
Matousek, J ;
Schwarzkopf, O .
SIAM JOURNAL ON COMPUTING, 1998, 27 (03) :654-667
[2]   A k-mean clustering algorithm for mixed numeric and categorical data [J].
Ahmad, Amir ;
Dey, Lipika .
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527
[3]  
[Anonymous], 2009, Proceedings of the acm international conference on web search and data mining, DOI DOI 10.1145/1498759.1498766
[4]   FactSheets: Increasing trust in AI services through supplier's declarations of conformity [J].
Arnold, M. ;
Bellamy, R. K. E. ;
Hind, M. ;
Houde, S. ;
Mehta, S. ;
Mojsilovic, A. ;
Nair, R. ;
Ramamurthy, K. Natesan ;
Olteanu, A. ;
Piorkowski, D. ;
Reimer, D. ;
Richards, J. ;
Tsay, J. ;
Varshney, K. R. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
[5]  
Asudeh A, 2021, ACM SIGMOD BLOG
[6]   Fairly Evaluating and Scoring Items in a Data Set [J].
Asudeh, Abolfazl ;
Jagadish, H., V .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12) :3445-3448
[7]   The Responsibility Challenge for Data [J].
Jagadish, H. V. ;
Bonchi, Francesco ;
Eliassi-Rad, Tina ;
Getoor, Lise ;
Gummadi, Krishna ;
Stoyanovich, Julia .
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, :412-414
[8]   Assessing and Remedying Coverage for a Given Dataset [J].
Asudeh, Abolfazl ;
Jin, Zhongjun ;
Jagadish, H. V. .
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, :554-565
[9]   On Obtaining Stable Rankings [J].
Asudeh, Abolfazl ;
Jagadish, H., V ;
Miklau, Gerome ;
Stoyanovich, Julia .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 12 (03) :237-250
[10]  
AURENHAMMER F, 1991, COMPUT SURV, V23, P345, DOI 10.1145/116873.116880