BI-CROSS-VALIDATION OF THE SVD AND THE NONNEGATIVE MATRIX FACTORIZATION

被引:149
作者
Owen, Art B. [1 ]
Perry, Patrick O. [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
关键词
Cross-validation; principal components; random matrix theory; sample reuse; weak factor model; SINGULAR-VALUE DECOMPOSITION; EXPRESSION DATA; FACTOR MODELS; COMPONENTS; NUMBER; DIMENSION; SELECTION;
D O I
10.1214/08-AOAS227
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article presents a form of bi-cross-validation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the nonnegative matrix factorization (NW). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of columns, and then predict the left out entries by low rank operations on the retained data. We prove a self-consistency result expressing the prediction error as a residual from a low rank approximation. Random matrix theory and some empirical results suggest that smaller hold-out sets lead to more over-fitting, while larger ones are more prone to under-fitting. In simulated examples we find that a method leaving out half the rows and half the columns performs well.
引用
收藏
页码:564 / 594
页数:31
相关论文
共 41 条