Multi-Source Learning with Block-wise Missing Data for Alzheimer's Disease Prediction

被引:0
作者
Xiang, Shuo [1 ,2 ]
Yuan, Lei [1 ,2 ]
Fan, Wei [3 ]
Wang, Yalin [1 ]
Thompson, Paul M. [4 ]
Ye, Jieping [1 ,2 ]
机构
[1] ASU, Comp Sci & Engn, Tempe, AZ 85287 USA
[2] ASU, Biodesign Inst, Ctr Evolutionary Med & Informat, Tempe, AZ 85287 USA
[3] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
[4] Univ Calif Los Angeles, Dept Neurol, Lab Neuro Imaging, Los Angeles, CA 90095 USA
来源
19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13) | 2013年
关键词
Alzheimer's disease; multi-source; block-wise missing data; optimization; REGRESSION; FRAMEWORK; SHRINKAGE; SELECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advances and increasing sophistication in data collection techniques, we are facing with large amounts of data collected from multiple heterogeneous sources in many applications. For example, in the study of Alzheimer's Disease (AD), different types of measurements such as neuroimages, gene/protein expression data, genetic data etc. are often collected and analyzed together for improved predictive power. It is believed that a joint learning of multiple data sources is beneficial as different data sources may contain complementary information, and feature-pruning and data source selection are critical for learning interpretable models from high-dimensional data. Very often the collected data comes with block-wise missing entries; for example, a patient without the MRI scan will have no information in the MRI data block, making his/her overall record incomplete. There has been a growing interest in the data mining community on expanding traditional techniques for single-source complete data analysis to the study of multi-source incomplete data. The key challenge is how to effectively integrate information from multiple heterogeneous sources in the presence of block-wise missing data. In this paper we first investigate the situation of complete data and present a unified "bilevel" learning model for multi-source data. Then we give a natural extension of this model to the more challenging case with incomplete data. Our major contributions are threefold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. Moreover, it can be easily generalized to other applications with block-wise missing data sources; (3) efficient optimization algorithms are presented for both the complete and incomplete models. We have performed comprehensive evaluations of the proposed models on the application of AD diagnosis. Our proposed models compare favorably against existing approaches.
引用
收藏
页码:185 / 193
页数:9
相关论文
共 34 条
[1]   A fast linkage detection scheme for multi-source information integration [J].
Aizawa, A ;
Oyama, K .
INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, :30-39
[2]  
Ando R. K., 2007, P 24 INT C MACH LEAR, P25, DOI DOI 10.1145/1273496.1273500
[3]  
[Anonymous], 1997, ACTA MATH VIETNAM
[4]  
[Anonymous], 2010, FIXED POINT ALGORITH
[5]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[6]  
[Anonymous], 2011, MACH LEARN
[7]  
[Anonymous], 2009, Proceedings of the Twenty-sixth International Conference on Machine Learning, DOI DOI 10.1145/1553374.1553484
[8]   Convex multi-task feature learning [J].
Argyriou, Andreas ;
Evgeniou, Theodoros ;
Pontil, Massimiliano .
MACHINE LEARNING, 2008, 73 (03) :243-272
[9]   2-POINT STEP SIZE GRADIENT METHODS [J].
BARZILAI, J ;
BORWEIN, JM .
IMA JOURNAL OF NUMERICAL ANALYSIS, 1988, 8 (01) :141-148
[10]   A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].
Beck, Amir ;
Teboulle, Marc .
SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202