Error correction for massive datasets

被引:10
作者
Bruni, R [1 ]
机构
[1] Univ Roma La Sapienza, DIS, I-00185 Rome, Italy
关键词
data correction; inconsistency localization; massive datasets;
D O I
10.1080/10556780512331318281
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The paper is concerned with the problem of automatic detection and correction of errors into massive datasets. As customary, erroneous data records are detected by formulating a set of rules. Such rules are here encoded into linear inequalities. This allows to check the set of rules for inconsistencies and redundancies by using a polyhedral mathematics approach. Moreover, it allows to correct erroneous data records by introducing the minimum changes through an integer linear programming approach. Results of a particularization of the proposed procedure to a real-world case of census data correction are reported.
引用
收藏
页码:291 / 310
页数:20
相关论文
共 35 条
[1]   On the maximum feasible subsystem problem, IISs and IIS-hypergraphs [J].
Amaldi, E ;
Pfetsch, ME ;
Trotter, LE .
MATHEMATICAL PROGRAMMING, 2003, 95 (03) :533-554
[2]  
[Anonymous], 1993, MODEL BUILDING MATH
[3]  
AYEL M, 1991, VALIDATION VERIFICAT
[4]  
BANKIER M, 2000, P WORKSH DAT ED UN E
[5]  
Bertsimas D., 1997, Introduction to linear optimization
[6]   Logical analysis of numerical data [J].
Boros, E ;
Hammer, PL ;
Ibaraki, T ;
Kogan, A .
MATHEMATICAL PROGRAMMING, 1997, 79 (1-3) :163-190
[7]  
BRUNI R, 2001, P STAT CAN S ACH DAT
[8]  
BRUNI R, 2001, LECT NOTES COMPUTER, V2189
[9]  
Chandru Vijay., 1999, WIL INT S D
[10]   Locating minimal infeasible constraint sets in linear programs [J].
Chinneck, John W. ;
Dravnieks, Erik W. .
ORSA journal on computing, 1991, 3 (02) :157-168