Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form of imputation that mixes regression with lower rank approximations. To improve the qual-ity of the imputations, a generalisation is proposed that replaces the singular value decomposition (SVD) of the matrix with a regularised SVD in which the regularisation parameter is estimated by cross-validation. To evaluate the performance of the proposal, ten sets of real data from mul-tienvironment trials were used. Missing values were created in each set at four percentages of missing not at random, and three criteria were then considered to investigate the effectiveness of the proposal. The results show that the regularised method proves very competitive when com-pared to the original method, beating it in several of the considered scenarios. As it is a very general system, its application can be extended to all multivariate data matrices. & BULL; The imputation method is modified through the inclusion of a stable and efficient compu-tational algorithm that replaces the classical SVD least squares criterion by a penalised cri-terion. This penalty produces smoothed eigenvectors and eigenvalues that avoid overfitting problems, improving the performance of the method when the penalty is necessary. The size of the penalty can be determined by minimising one of the following criteria: the prediction errors, the Procrustes similarity statistic or the critical angles between subspaces of principal components.
机构:
Univ Autonoma Nuevo Leon, Sch Mech & Elect Engn, San Nicolas De Los Garza 66451, Nuevo Leon, MexicoUniv South Eastern Norway, Dept Elect Engn Informat Technol & Cyberne, N-3918 Porsgrunn, Norway
机构:
Univ Autonoma Nuevo Leon, Sch Mech & Elect Engn, San Nicolas De Los Garza 66451, Nuevo Leon, MexicoUniv South Eastern Norway, Dept Elect Engn Informat Technol & Cyberne, N-3918 Porsgrunn, Norway
Andrade, Manuel A.
Vazquez, Ernesto
论文数: 0引用数: 0
h-index: 0
机构:
Univ Autonoma Nuevo Leon, Sch Mech & Elect Engn, San Nicolas De Los Garza 66451, Nuevo Leon, MexicoUniv South Eastern Norway, Dept Elect Engn Informat Technol & Cyberne, N-3918 Porsgrunn, Norway
机构:
Leiden Univ, Med Ctr, Dept Biomed Data Sci, Med Stat, POB 9600, NL-2300 RC Leiden, NetherlandsLeiden Univ, Med Ctr, Dept Biomed Data Sci, Med Stat, POB 9600, NL-2300 RC Leiden, Netherlands
Mertens, Bart J. A.
Banzato, Erika
论文数: 0引用数: 0
h-index: 0
机构:
Univ Padua, Dept Stat Sci, Padua, ItalyLeiden Univ, Med Ctr, Dept Biomed Data Sci, Med Stat, POB 9600, NL-2300 RC Leiden, Netherlands