Selection of discriminant mid-infrared wavenumbers by combining a naive Bayesian classifier and a genetic algorithm: Application to the evaluation of lignocellulosic biomass biodegradation

被引:9
作者
Rammal, Abbas [1 ]
Perrin, Eric [1 ]
Vrabie, Valeriu [1 ]
Assaf, Rabih [1 ]
Fenniri, Hassan [1 ]
机构
[1] Univ Reims, CReSTIC Chalons, F-51000 Chaussee Du Port, Chalons En Cham, France
关键词
Genetic algorithm; Naive Bayesian classifier; Fitness function; Linear discriminant analysis; Biodegradation of lignocellulosic biomass; Mid-Infrared Spectroscopy; SPECTROSCOPY; PREDICTION; SOIL;
D O I
10.1016/j.mbs.2017.05.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Infrared spectroscopy provides useful information on the molecular compositions of biological systems related to molecular vibrations, overtones, and combinations of fundamental vibrations. Mid-infrared (MIR) spectroscopy is sensitive to organic and mineral components and has attracted growing interest in the development of biomarkers related to intrinsic characteristics of lignocellulose biomass. However, not all spectral information is valuable for biomarker construction or for applying analysis methods such as classification. Better processing and interpretation can be achieved by identifying discriminating wavenumbers. The selection of wavenumbers has been addressed through several variable- or feature-selection methods. Some of them have not been adapted for use in large data sets or are difficult to tune, and others require additional information, such as concentrations. This paper proposes a new approach by combining a naive Bayesian classifier with a genetic algorithm to identify discriminating spectral wavenumbers. The genetic algorithm uses a linear combination of an a posteriori probability and the Bayes error rate as the fitness function for optimization. Such a function allows the improvement of both the compactness and the separation of classes. This approach was tested to classify a small set of maize roots in soil according to their biodegradation process based on their MIR spectra. The results show that this optimization method allows better discrimination of the biodegradation process, compared with using the information of the entire MIR spectrum, the use of the spectral information at wavenumbers selected by a genetic algorithm based on a classical validity index or the use of the spectral information selected by combining a genetic algorithm with other methods, such as Linear Discriminant Analysis. The proposed method selects wavenumbers that correspond to principal vibrations of chemical functional groups of compounds that undergo degradation/conversion during the biodegradation of lignocellulosic biomass. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 161
页数:9
相关论文
共 35 条
[1]  
[Anonymous], 1990, Introduction to statistical pattern recognition
[2]   Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data [J].
Balabin, Roman M. ;
Smirnov, Sergey V. .
ANALYTICA CHIMICA ACTA, 2011, 692 (1-2) :63-72
[3]   Rapid near infrared spectroscopy for prediction of enzymatic hydrolysis of corn bran after various pretreatments [J].
Baum, Andreas ;
Agger, Jane ;
Meyer, Anne S. ;
Egebo, Max ;
Mikkelsen, Jorn Dalgaard .
NEW BIOTECHNOLOGY, 2012, 29 (03) :293-301
[4]   Near-infrared (NIR) and mid-infrared (MIR) spectroscopic techniques for assessing the amount of carbon stock in soils - Critical review and research perspectives [J].
Bellon-Maurel, Veronique ;
McBratney, Alex .
SOIL BIOLOGY & BIOCHEMISTRY, 2011, 43 (07) :1398-1410
[5]   A hybrid LDA and genetic algorithm for gene selection and classification of microarray data [J].
Bonilla Huerta, Edmundo ;
Duval, Beatrice ;
Hao, Jin-Kao .
NEUROCOMPUTING, 2010, 73 (13-15) :2375-2383
[6]   Qualitative and quantitative analysis of wood samples by Fourier transform infrared spectroscopy and multivariate analysis [J].
Chen, Huilun ;
Ferrari, Carlo ;
Angiuli, Marco ;
Yao, Jun ;
Raspi, Costantino ;
Bramanti, Emilia .
CARBOHYDRATE POLYMERS, 2010, 82 (03) :772-778
[7]   A new mutation operator for real coded genetic algorithms [J].
Deep, Kusum ;
Thakur, Manoj .
APPLIED MATHEMATICS AND COMPUTATION, 2007, 193 (01) :211-230
[8]   GENETIC ALGORITHMS - PRINCIPLES OF NATURAL-SELECTION APPLIED TO COMPUTATION [J].
FORREST, S .
SCIENCE, 1993, 261 (5123) :872-878
[9]   BOUNDS ON THE BAYES CLASSIFICATION ERROR BASED ON PAIRWISE RISK FUNCTIONS [J].
GARBER, FD ;
DJOUADI, A .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (02) :281-288
[10]   A hybrid genetic algorithm for the job shop scheduling problem [J].
Gonçalves, JF ;
Mendes, JJDM ;
Resende, MGC .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2005, 167 (01) :77-95