Controlling the false discovery rate for feature selection in high-resolution NMR spectra

被引:18
|
作者
Kim, Seoung Bum [1 ]
Chen, Victoria C. P. [1 ]
Park, Youngja [2 ]
Ziegler, Thomas R. [2 ]
Jones, Dean P. [2 ]
机构
[1] Department of Industrial and Manufacturing Systems Engineering, University of Texas at Arlington, Arlington, TX
[2] Clinical Biomarker Laboratory, Center for Clinical and Molecular Nutrition, Department of Medicine, Emory University, Atlanta, GA
来源
Statistical Analysis and Data Mining | 2008年 / 1卷 / 02期
关键词
False discovery rate; Feature selection; Metabolomics; Nuclear magnetic resonance; Orthogonal signal correction;
D O I
10.1002/sam.10005
中图分类号
学科分类号
摘要
Successful implementation of feature selection in nuclear magnetic resonance (NMR) spectra not only improves classification ability, but also simplifies the entire modeling process and, thus, reduces computational and analytical efforts. Principal component analysis (PCA) and partial least squares (PLS) have been widely used for feature selection in NMR spectra. However, extracting meaningful metabolite features from the reduced dimensions obtained through PCA or PLS is complicated because these reduced dimensions are linear combinations of a large number of the original features. In this paper, we propose a multiple testing procedure controlling false discovery rate (FDR) as an efficient method for feature selection in NMR spectra. The procedure clearly compensates for the limitation of PCA and PLS and identifies individual metabolite features necessary for classification. In addition, we present orthogonal signal correction to improve classification and visualization by removing unnecessary variations in NMR spectra. Our experimental results with real NMR spectra showed that classification models constructed with the features selected by our proposed procedure yielded smaller misclassification rates than those with all features. © 2008 Wiley Periodicals, Inc.
引用
收藏
页码:57 / 66
页数:9
相关论文
共 50 条
  • [1] Discovery of metabolite features for the modelling and analysis of high-resolution NMR spectra
    Cho, Hyun-Woo
    Kim, Seoung Bum
    Jeong, Myong K.
    Park, Youngja
    Miller, Nana Gletsu
    Ziegler, Thomas R.
    Jones, Dean P.
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2008, 2 (02) : 176 - 192
  • [2] Genetic algorithm-based feature selection in high-resolution NMR spectra
    Cho, Hyun-Woo
    Kim, Seoung Bum
    Jeong, Myong K.
    Park, Youngja
    Ziegler, Thomas R.
    Jones, Dean P.
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 967 - 975
  • [3] Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain
    Kim, Seoung Bum
    Wang, Zhou
    Oraintara, Soontorn
    Temiyasathit, Chivalai
    Wongsawat, Yodchanan
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 90 (02) : 161 - 168
  • [4] Classification of High-Resolution NMR Spectra Based on Complex Wavelet Domain Feature Selection and Kernel-Induced Random Forest
    Fan, Guangzhe
    Wang, Zhou
    Kim, Seoung Bum
    Temiyasathit, Chivalai
    IMAGE AND SIGNAL PROCESSING, PROCEEDINGS, 2010, 6134 : 593 - +
  • [5] Linear-mixed effects models for feature selection in high-dimensional NMR spectra
    Mei, Yajun
    Kim, Seoung Bum
    Tsui, Kwok-Leung
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 4703 - 4708
  • [6] A clarifying comparison of methods for controlling the false discovery rate
    Yin, Yaling
    Soteros, Christine E.
    Bickis, Mikelis G.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (07) : 2126 - 2137
  • [7] Hierarchical false discovery rate-controlling methodology
    Yekutieli, Daniel
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (481) : 309 - 316
  • [8] Controlling False Discovery Rate Using Gaussian Mirrors
    Xing, Xin
    Zhao, Zhigen
    Liu, Jun S.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 222 - 241
  • [9] Benefiting feature selection by the discovery of false irrelevant attributes
    Chao, Lidia S.
    Wong, Derek F.
    Chen, Philip C. L.
    Ng, Wing W. Y.
    Yeung, Daniel S.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2015, 13 (04)
  • [10] A Process Monitoring Scheme Controlling False Discovery Rate
    Lee, Sang-Ho
    Jun, Chi-Hyuck
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2012, 41 (10) : 1912 - 1920