Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data

被引:138
作者
Jarvis, RM [1 ]
Goodacre, R [1 ]
机构
[1] Univ Manchester, Dept Chem, Manchester M60 1QD, Lancs, England
基金
英国工程与自然科学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/bti102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently, there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result, the opportunity to obtain new knowledge from such data is lost. Methods: We used genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA). Results: The GA selects sensible pre-processing steps from a total of similar to 10(10) possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed; thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.
引用
收藏
页码:860 / 868
页数:9
相关论文
共 52 条
[1]  
AKC T, 1997, HDB EVOLUTIONARY COM
[2]   MEAN SQUARE ERROR OF PREDICTION AS A CRITERION FOR SELECTING VARIABLES [J].
ALLEN, DM .
TECHNOMETRICS, 1971, 13 (03) :469-&
[3]  
[Anonymous], 1989, GENETIC ALGORITHM SE
[4]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[5]  
[Anonymous], BIOMEDICAL COMPUTER
[6]   At-line monitoring of a submerged filamentous bacterial cultivation using near-infrared spectroscopy [J].
Arnold, SA ;
Crowley, J ;
Vaidyanathan, S ;
Matheson, L ;
Mohan, P ;
Hall, JW ;
Harvey, LM ;
McNeil, B .
ENZYME AND MICROBIAL TECHNOLOGY, 2000, 27 (09) :691-697
[7]   Proteomics: quantitative and physical mapping of cellular proteins [J].
Blackstock, WP ;
Weir, MP .
TRENDS IN BIOTECHNOLOGY, 1999, 17 (03) :121-127
[8]   Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry [J].
Broadhurst, D ;
Goodacre, R ;
Jones, A ;
Rowland, JJ ;
Kell, DB .
ANALYTICA CHIMICA ACTA, 1997, 348 (1-3) :71-86
[9]   Chipping away at the transcriptome [J].
Burge, CB .
NATURE GENETICS, 2001, 27 (03) :232-234
[10]  
Chipperfield AJ., 1995, IEE C APPL CONTR TEC