NON-NEGATIVE MATRIX FACTORIZATION OF CLUSTERED DATA WITH MISSING VALUES

被引:0
作者
Chen, Rebecca [1 ]
Varshney, Lav R.
机构
[1] Univ Illinois, Coordinated Sci Lab, Champaign, IL 61820 USA
来源
2019 IEEE DATA SCIENCE WORKSHOP (DSW) | 2019年
关键词
imputation; missing values; non-negative matrix factorization; optimal recovery; GENE-EXPRESSION DATA; IMPUTATION;
D O I
10.1109/dsw.2019.8755555
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose the approximation-theoretic technique of optimal recovery for imputing missing values in clustered data, specifically for non-negative matrix factorization (NMF), and develop an algorithm for implementation. Under certain geometric conditions, we prove tight upper bounds on NMF relative error, which is the first bound of this type for missing values. Experiments on image data and biological data show that this technique performs as well as or better than other imputation techniques that account for local structure.
引用
收藏
页码:180 / 184
页数:5
相关论文
共 27 条
[1]  
[Anonymous], 1959, On Numerical Approximation
[2]  
Bu Y., 2017, 2017 IEEE INT S INF
[3]  
Cai Zhipeng, 2006, Journal of Bioinformatics and Computational Biology, V4, P935, DOI 10.1142/S0219720006002302
[4]   The properties of high-dimensional data spaces: implications for exploring gene and protein expression data [J].
Clarke, Robert ;
Ressom, Habtom W. ;
Wang, Antai ;
Xuan, Jianhua ;
Liu, Minetta C. ;
Gehan, Edmund A. ;
Wang, Yue .
NATURE REVIEWS CANCER, 2008, 8 (01) :37-49
[5]  
Donoho D, 2004, ADV NEUR IN, V16, P1141
[6]   STATISTICAL ESTIMATION AND OPTIMAL RECOVERY [J].
DONOHO, DL .
ANNALS OF STATISTICS, 1994, 22 (01) :238-270
[7]  
Handlin M. S., 2013, CONIC SECTIONS R2
[8]  
Hastie Trevor., 1999, TECHNICAL REPORT
[9]  
Heath T. L., 1986, APOLLONIUS PERGA TRE
[10]   Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome [J].
Higuera, Clara ;
Gardiner, Katheleen J. ;
Cios, Krzysztof J. .
PLOS ONE, 2015, 10 (06)