Techniques for dealing with incomplete data: a tutorial and survey

被引:27
作者
Aste, Marco [1 ]
Boninsegna, Massimo [1 ]
Freno, Antonino [2 ]
Trentin, Edmondo [3 ]
机构
[1] EyePro Syst, I-38121 Trento, Italy
[2] Amazon DCG GmbH, D-10707 Berlin, Germany
[3] Univ Siena, DIISM, I-53100 Siena, Italy
关键词
Statistical pattern recognition; Incomplete data; Noisy data; Missing data; Density estimation; Neural network; MISSING VALUE ESTIMATION; MAXIMUM-LIKELIHOOD; NEURAL-NETWORKS; PATTERN-CLASSIFICATION; CONVERGENCE PROPERTIES; ALTERNATIVE METHODS; EM ALGORITHM; IMPUTATION; MODELS; RECOGNITION;
D O I
10.1007/s10044-014-0411-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world applications of pattern recognition, or machine learning algorithms, often present situations where the data are partly missing, corrupted by noise, or otherwise incomplete. In spite of that, developments in the machine learning community in the last decade have mostly focused on mathematical analysis of learning machines, making it difficult for practitioners to recollect an overview of major approaches to this issue. Paradoxically, as a consequence, even established methodologies rooted in statistics appear to have long been forgotten. Although the relevant literature on the topic is so wide that no exhaustive coverage is nowadays possible, the first goal of this paper is to provide the reader with a nonetheless significant survey of major, or utterly sound, techniques for dealing with the tasks of pattern recognition, machine learning, and density estimation from incomplete data. Secondly, the paper aims at representing a viable tutorial tool for the interested practitioner, by allowing for self-contained, step-by-step understanding of several approaches. An effort is made to categorize the different techniques as follows: (1) heuristic methods; (2) statistical approaches; (3) connectionist-oriented techniques; (4) other approaches (dynamical systems, adversarial deletion of features, etc.).
引用
收藏
页码:1 / 29
页数:29
相关论文
共 146 条
[1]  
Ahmad S., 1993, ADV NEURAL INFORM PR, P393
[2]   Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer [J].
Ali, A. M. G. ;
Dawson, S. -J ;
Blows, F. M. ;
Provenzano, E. ;
Ellis, I. O. ;
Baglietto, L. ;
Huntsman, D. ;
Caldas, C. ;
Pharoah, P. D. .
BRITISH JOURNAL OF CANCER, 2011, 104 (04) :693-699
[3]  
Almeida L. B., 1987, IEEE First International Conference on Neural Networks, P609
[4]  
[Anonymous], IEEE NEUR NETW WORKS
[5]  
[Anonymous], 1509 AI MIT
[6]  
[Anonymous], P SAS GLOB FOR SUGI
[7]  
[Anonymous], MISSING DATA TECHNIQ
[8]  
[Anonymous], 2009, IEEE Trans. Neural Networks
[9]  
[Anonymous], MARKOV CHAIN MONTE C
[10]  
[Anonymous], 1994, Advances in Neural Information Processing Systems