Selection of discriminant mid-infrared wavenumbers by combining a naive Bayesian classifier and a genetic algorithm: Application to the evaluation of lignocellulosic biomass biodegradation

被引:9
作者
Rammal, Abbas [1 ]
Perrin, Eric [1 ]
Vrabie, Valeriu [1 ]
Assaf, Rabih [1 ]
Fenniri, Hassan [1 ]
机构
[1] Univ Reims, CReSTIC Chalons, F-51000 Chaussee Du Port, Chalons En Cham, France
关键词
Genetic algorithm; Naive Bayesian classifier; Fitness function; Linear discriminant analysis; Biodegradation of lignocellulosic biomass; Mid-Infrared Spectroscopy; SPECTROSCOPY; PREDICTION; SOIL;
D O I
10.1016/j.mbs.2017.05.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Infrared spectroscopy provides useful information on the molecular compositions of biological systems related to molecular vibrations, overtones, and combinations of fundamental vibrations. Mid-infrared (MIR) spectroscopy is sensitive to organic and mineral components and has attracted growing interest in the development of biomarkers related to intrinsic characteristics of lignocellulose biomass. However, not all spectral information is valuable for biomarker construction or for applying analysis methods such as classification. Better processing and interpretation can be achieved by identifying discriminating wavenumbers. The selection of wavenumbers has been addressed through several variable- or feature-selection methods. Some of them have not been adapted for use in large data sets or are difficult to tune, and others require additional information, such as concentrations. This paper proposes a new approach by combining a naive Bayesian classifier with a genetic algorithm to identify discriminating spectral wavenumbers. The genetic algorithm uses a linear combination of an a posteriori probability and the Bayes error rate as the fitness function for optimization. Such a function allows the improvement of both the compactness and the separation of classes. This approach was tested to classify a small set of maize roots in soil according to their biodegradation process based on their MIR spectra. The results show that this optimization method allows better discrimination of the biodegradation process, compared with using the information of the entire MIR spectrum, the use of the spectral information at wavenumbers selected by a genetic algorithm based on a classical validity index or the use of the spectral information selected by combining a genetic algorithm with other methods, such as Linear Discriminant Analysis. The proposed method selects wavenumbers that correspond to principal vibrations of chemical functional groups of compounds that undergo degradation/conversion during the biodegradation of lignocellulosic biomass. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 161
页数:9
相关论文
共 35 条
[21]  
Mitchell M., 1995, Complexity, V1, P31
[22]   On applying linear discriminant analysis for multi-labeled problems [J].
Park, Cheong Hee ;
Lee, Moonhwi .
PATTERN RECOGNITION LETTERS, 2008, 29 (07) :878-887
[23]  
Picek Stjepan, 2010, WSEAS Transactions on Computers, V9, P1064
[24]  
Rammal A., 2014, Journal of Applied Science and Agriculture, V9, P382
[25]  
Rammal A., INT C TECHN NETW DEV
[26]   Evaluation of Lignocellulosic Biomass Degradation by Combining Mid- and Near-Infrared Spectra by the Outer Product and Selecting Discriminant Wavenumbers Using a Genetic Algorithm [J].
Rammal, Abbas ;
Perrin, Eric ;
Chabbert, Brigitte ;
Bertrand, Isabelle ;
Habrant, Anouck ;
Lecart, Brieuc ;
Vrabie, Valeriu .
APPLIED SPECTROSCOPY, 2015, 69 (11) :1303-1312
[27]  
Ranjini A., 2013, INT J RES ENG, V2, P775
[28]   Potential of Near- and Mid-infrared Spectroscopy in Biofuel Production [J].
Reeves, James B., III .
COMMUNICATIONS IN SOIL SCIENCE AND PLANT ANALYSIS, 2012, 43 (1-2) :478-495
[29]   A review of feature selection techniques in bioinformatics [J].
Saeys, Yvan ;
Inza, Inaki ;
Larranaga, Pedro .
BIOINFORMATICS, 2007, 23 (19) :2507-2517
[30]  
Tumer K., 1996, Proceedings of the 13th International Conference on Pattern Recognition, P695, DOI 10.1109/ICPR.1996.546912