Error correction for massive datasets

被引:10
作者
Bruni, R [1 ]
机构
[1] Univ Roma La Sapienza, DIS, I-00185 Rome, Italy
关键词
data correction; inconsistency localization; massive datasets;
D O I
10.1080/10556780512331318281
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The paper is concerned with the problem of automatic detection and correction of errors into massive datasets. As customary, erroneous data records are detected by formulating a set of rules. Such rules are here encoded into linear inequalities. This allows to check the set of rules for inconsistencies and redundancies by using a polyhedral mathematics approach. Moreover, it allows to correct erroneous data records by introducing the minimum changes through an integer linear programming approach. Results of a particularization of the proposed procedure to a real-world case of census data correction are reported.
引用
收藏
页码:291 / 310
页数:20
相关论文
共 35 条
[11]   Fast heuristics for the maximum feasible subsystem problem [J].
Chinneck, JW .
INFORMS JOURNAL ON COMPUTING, 2001, 13 (03) :210-223
[12]  
DEWAAL T, 2003, THESIS ERIM
[13]  
FAYYAD UM, 1996, ADV KNOWLEDTE DISCOV
[14]   SYSTEMATIC APPROACH TO AUTOMATIC EDIT AND IMPUTATION [J].
FELLEGI, IP ;
HOLT, D .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1976, 71 (353) :17-35
[15]  
FRANCONI E, 2001, LECT NOTES ARTIFICIA, V2250
[16]   ERROR LOCALIZATION FOR ERRONEOUS DATA - CONTINUOUS DATA, LINEAR CONSTRAINTS [J].
GARFINKEL, RS ;
KUNNATHUR, AS ;
LIEPINS, GE .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1988, 9 (05) :922-931
[17]   OPTIMAL IMPUTATION OF ERRONEOUS DATA - CATEGORICAL-DATA, GENERAL EDITS [J].
GARFINKEL, RS ;
KUNNATHUR, AS ;
LIEPINS, GE .
OPERATIONS RESEARCH, 1986, 34 (05) :744-751
[18]  
Gleeson J., 1990, ORSA Journal on Computing, V2, P61, DOI 10.1287/ijoc.2.1.61
[19]  
GUIEU O, 1999, INFORMS J COMPUTING, V11
[20]  
GUYON I, 1996, ADV KNOWLEDGE DISCOV