Cyanotoxin level prediction in a reservoir using gradient boosted regression trees: a case study

被引:12
作者
Garcia Nieto, Paulino Jose [1 ]
Garcia-Gonzalo, Esperanza [1 ]
Sanchez Lasheras, Fernando [2 ]
Alonso Fernandez, Jose Ramon [3 ]
Diaz Muniz, Cristina [3 ]
de Cos Juez, Francisco Javier [4 ]
机构
[1] Univ Oviedo, Fac Sci, Dept Math, Oviedo 33007, Spain
[2] Univ Oviedo, Dept Construct & Mfg Engn, Gijon 33204, Spain
[3] Spanish Minist Agr Food & Environm, Cantabrian Basin Author, Oviedo 33071, Spain
[4] Univ Oviedo, Exploitat & Prospecting Dept, Oviedo 33004, Spain
关键词
Statistical machine learning techniques; Regression trees; Gradient boosting; Cyanotoxins; Cyanobacteria Harmful algal blooms (HABs); BALTIC SEA; PH; BLOOMS; CONTAMINATION; MODEL;
D O I
10.1007/s11356-018-2219-4
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Cyanotoxins are a type of cyanobacteria that is poisonous and poses a health threat in waters that could be used for drinking or recreational purposes. Thus, it is necessary to predict their presence to avoid risks. This paper presents a nonparametric machine learning approach using a gradient boosted regression tree model (GBRT) for prediction of cyanotoxin contents from cyanobacterial concentrations determined experimentally in a reservoir located in the north of Spain. GBRT models seek and obtain good predictions in highly nonlinear problems, like the one treated here, where the studied variable presents low concentrations of cyanotoxins mixed with high concentration peaks. Two types of results have been obtained: firstly, the model allows the ranking or the dependent variables according to its importance in the model. Finally, the high performance and the simplicity of the model make the gradient boosted tree method attractive compared to conventional forecasting techniques.
引用
收藏
页码:22658 / 22671
页数:14
相关论文
共 47 条
[1]  
[Anonymous], 2006, BIOL FRESHWATER WETL
[2]  
[Anonymous], 2006, ENVIRON HEALTH-GLOB
[3]  
[Anonymous], 2004, LIMNETICA
[4]  
Barnes DavidJ., 2010, Introduction to Modeling for Biosciences
[5]   A gradient boosting approach to the Kaggle load forecasting competition [J].
Ben Taieb, Souhaib ;
Hyndman, Rob J. .
INTERNATIONAL JOURNAL OF FORECASTING, 2014, 30 (02) :382-394
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]   Impact of Environmental Factors on the Regulation of Cyanotoxin Production [J].
Boopathi, Thangavelu ;
Ki, Jang-Seu .
TOXINS, 2014, 6 (07) :1951-1978
[8]  
Bronmark C., 2005, BIOL LAKES PONDS, V2nd
[9]   Boosting algorithms: Regularization, prediction and model fitting [J].
Buehlmann, Peter ;
Hothorn, Torsten .
STATISTICAL SCIENCE, 2007, 22 (04) :477-505
[10]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794