Classification of spectral data using fused lasso logistic regression

被引:20
作者
Yu, Donghyeon [1 ]
Lee, Seul Ji [3 ]
Lee, Won Jun [3 ]
Kim, Sang Cheol [4 ]
Lim, Johan [2 ]
Kwon, Sung Won [3 ]
机构
[1] Keimyung Univ, Dept Stat, Daegu, South Korea
[2] Seoul Natl Univ, Dept Stat, Seoul, South Korea
[3] Seoul Natl Univ, Coll Pharm, Seoul, South Korea
[4] Samsung Genome Inst, Samsung Med Ctr, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Classification; Fused lasso regression; Mass spectral data; l(1)-regularization; Penalized logistic regression; METABOLOMIC DATA-ANALYSIS; MOLECULAR PROFILE DATA; VARIABLE SELECTION; REGULARIZATION; IDENTIFICATION; METABOANALYST; SHRINKAGE; SERVER; MZMINE;
D O I
10.1016/j.chemolab.2015.01.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spectral data contain powerful information that can be used to identify unknown compounds and their chemical structures. In this paper, we study fused lasso logistic regression (FLLR) to classify the spectral data into two groups. We show that the FLLR has a grouping property on regression coefficients, which simultaneously selects a group of highly correlated variables together. Both the sparsity and the grouping property of the FUR provide great advantages in the analysis of the spectral data. In particular, it resolves the well-known peak misalignment problem of the spectral data by providing data dependent binning, and provides a better interpretable classifier than other l(1)-regularization methods. We also analyze the gas chromatography/mass spectrometry data to classify the origin of herbal medicines, and illustrate the advantages of the FLU over other existing l(1)-irregularized methods. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:70 / 77
页数:8
相关论文
共 24 条
[1]  
[Anonymous], P GENS
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR [J].
Bondell, Howard D. ;
Reich, Brian J. .
BIOMETRICS, 2008, 64 (01) :115-123
[5]   Metabolomics: A Revolution for Novel Cancer Marker Identification [J].
Bu, Qian ;
Huang, Yina ;
Yan, Guangyan ;
Cen, Xiaobo ;
Zhao, Ying-Lan .
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2012, 15 (03) :266-275
[6]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[7]   PATHWISE COORDINATE OPTIMIZATION [J].
Friedman, Jerome ;
Hastie, Trevor ;
Hoefling, Holger ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2007, 1 (02) :302-332
[8]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[9]   A Path Algorithm for the Fused Lasso Signal Approximator [J].
Hoefling, Holger .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (04) :984-1006
[10]   MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data [J].
Katajamaa, M ;
Miettinen, J ;
Oresic, M .
BIOINFORMATICS, 2006, 22 (05) :634-636