Imputation of missing values for compositional data using classical and robust methods

被引:204
|
作者
Hron, K. [2 ]
Templ, M. [1 ,3 ]
Filzmoser, P. [1 ]
机构
[1] Vienna Univ Technol, Dept Stat & Probabil Theory, A-1040 Vienna, Austria
[2] Palacky Univ, Fac Sci, Dept Math Anal & Applicat Math, Olomouc 77146, Czech Republic
[3] Stat Austria, A-1110 Vienna, Austria
关键词
OUTLIER DETECTION; ALGORITHM; ZEROS;
D O I
10.1016/j.csda.2009.11.023
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
New imputation algorithms for estimating missing values in compositional data are introduced. A first proposal uses the k-nearest neighbor procedure based on the Aitchison distance, a distance measure especially designed for compositional data. It is important to adjust the estimated missing values to the overall size of the compositional parts of the neighbors. As a second proposal an iterative model-based imputation technique is introduced which initially starts from the result of the proposed k-nearest neighbor procedure. The method is based on iterative regressions, thereby accounting for the whole multivariate data information. The regressions have to be performed in a transformed space, and depending on the data quality classical or robust regression techniques can be employed. The proposed methods are tested on a real and on simulated data sets. The results show that the proposed methods outperform standard imputation methods. In the presence of outliers, the model-based method with robust regressions is preferable. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3095 / 3107
页数:13
相关论文
共 50 条
  • [1] Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation
    J. A. Martín-Fernández
    C. Barceló-Vidal
    V. Pawlowsky-Glahn
    Mathematical Geology, 2003, 35 : 253 - 278
  • [2] Dealing with zeros and missing values in compositional data sets using nonparametric imputation
    Martín-Fernández, JA
    Barceló-Vidal, C
    Pawlowsky-Glahn, V
    MATHEMATICAL GEOLOGY, 2003, 35 (03): : 253 - 278
  • [3] Robust imputation method for missing values in microarray data
    Yoon, Dankyu
    Lee, Eun-Kyung
    Park, Taesung
    BMC BIOINFORMATICS, 2007, 8 (Suppl 2)
  • [4] Robust imputation method for missing values in microarray data
    Dankyu Yoon
    Eun-Kyung Lee
    Taesung Park
    BMC Bioinformatics, 8
  • [5] Methods for imputation of missing values in air quality data sets
    Junninen, H
    Niska, H
    Tuppurainen, K
    Ruuskanen, J
    Kolehmainen, M
    ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) : 2895 - 2907
  • [6] Optimization methods for the imputation of missing values in Educational Institutions Data
    Aureli, D.
    Bruni, R.
    Daraio, C.
    METHODSX, 2021, 8
  • [7] OVERCOMING MISSING VALUES USING IMPUTATION METHODS IN THE CLASSIFICATION OF TUBERCULOSIS
    Rochman, Eka Mala Sari
    Miswanto
    Suprajitno, Herry
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2022,
  • [8] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [9] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [10] Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets
    Schumann, Yannis
    Gocke, Antonia
    Neumann, Julia E.
    PROTEOMICS, 2025, 25 (1-2)