Error correction for massive datasets

被引:10
作者
Bruni, R [1 ]
机构
[1] Univ Roma La Sapienza, DIS, I-00185 Rome, Italy
关键词
data correction; inconsistency localization; massive datasets;
D O I
10.1080/10556780512331318281
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The paper is concerned with the problem of automatic detection and correction of errors into massive datasets. As customary, erroneous data records are detected by formulating a set of rules. Such rules are here encoded into linear inequalities. This allows to check the set of rules for inconsistencies and redundancies by using a polyhedral mathematics approach. Moreover, it allows to correct erroneous data records by introducing the minimum changes through an integer linear programming approach. Results of a particularization of the proposed procedure to a real-world case of census data correction are reported.
引用
收藏
页码:291 / 310
页数:20
相关论文
共 35 条
[21]  
Hand D.J., 2001, ADAP COMP MACH LEARN
[22]  
Hastie T., 2002, ELEMENTS STAT LEARNI
[23]  
*ILOG, 2000, ILOG CPLEX 7 0 REF M
[24]  
*ILOG, 2000, ILOG CONC TECHN 1 0
[25]  
MANZARI A, 2001, P 53 SESS INT STAT I
[26]   Knowledge maintenance: the state of the art [J].
Menzies, T .
KNOWLEDGE ENGINEERING REVIEW, 1999, 14 (01) :1-46
[27]  
Nemhauser GL, 1988, INTEGER COMBINATORIA
[28]  
POIRIER C, 1999, 12 UNECE
[29]   On solving the continuous data editing problem [J].
Ragsdale, CT ;
McKeown, PG .
COMPUTERS & OPERATIONS RESEARCH, 1996, 23 (03) :263-273
[30]  
Ramakrishnan Raghu, 2000, Database management systems, V2nd