Discriminating sample groups with multi-way data

被引:18
作者
Lyu, Tianmeng [1 ]
Lock, Eric F. [1 ]
Eberly, Lynn E. [1 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
基金
美国国家卫生研究院;
关键词
Classification; Distance weighted discrimination; Gene time-course; Magnetic resonance spectroscopy; Support vector machine; Tensors; TENSOR; CLASSIFICATION; REGRESSION;
D O I
10.1093/biostatistics/kxw057
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-dimensional linear classifiers, such as distance weighted discrimination (DWD) and versions of the support vector machine (SVM), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice, data are often multi-way, or measured over multiple dimensions. For example, metabolite abundance may be measured over multiple regions or tissues, or gene expression may be measured over multiple time points, for the same subjects. We propose a framework for linear classification of high-dimensional multi-way data, in which coefficients can be factorized into weights that are specific to each dimension. More generally, the coefficients for each measurement in a multi-way dataset are assumed to have low-rank structure. This framework extends existing classification techniques from single vector to multi-way features, and we have implemented multi-way versions of SVM and DWD. We describe informative simulation results, and apply multi-way DWD to data for two very different clinical research studies. The first study uses magnetic resonance spectroscopy metabolite data over multiple brain regions to compare participants with and without spinocerebellar ataxia; the second uses publicly available gene expression time-course data to compare degrees of treatment response among patients with multiple sclerosis. Our multi-way method can improve performance and simplify interpretation over naive applications of full rank linear and non-linear classification to multi-way data.
引用
收藏
页码:434 / 450
页数:17
相关论文
共 29 条
[1]   In Vivo Neurometabolic Profiling in Patients With Spinocerebellar Ataxia Types 1, 2, 3, and 7 [J].
Adanyeguh, Isaac M. ;
Henry, Pierre-Gilles ;
Nguyen, Tra M. ;
Rinaldi, Daisy ;
Jauffret, Celine ;
Valabregue, Romain ;
Emir, Uzay E. ;
Deelchand, Dinesh K. ;
Brice, Alexis ;
Eberly, Lynn E. ;
Oez, Guelin ;
Durr, Alexandra ;
Mochel, Fanny .
MOVEMENT DISORDERS, 2015, 30 (05) :662-670
[2]  
ALLEN G, 2012, P MACHINE LEARNING R, P27
[3]   Transcription-based prediction of response to IFNβ using supervised computational methods [J].
Baranzini, SE ;
Mousavi, P ;
Rio, J ;
Caillier, SJ ;
Stillman, A ;
Villoslada, P ;
Wyatt, MM ;
Comabella, M ;
Greller, LD ;
Somogyi, R ;
Montalban, X ;
Oksenberg, JR .
PLOS BIOLOGY, 2005, 3 (01) :166-176
[4]  
Bauckhage C, 2007, LECT NOTES COMPUT SC, V4633, P352
[5]  
Bi J., 2003, Journal of Machine Learning Research, V3, P1229, DOI 10.1162/153244303322753643
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Cichocki A., 2013, ARXIV13050395
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]   Supervised tensor learning [J].
Dacheng Tao ;
Xuelong Li ;
Xindong Wu ;
Weiming Hu ;
Stephen J. Maybank .
KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 13 (01) :1-42