Predicting the future is hard and other lessons from a population time series data science competition

被引:12
作者
Humphries, G. R. W. [1 ]
Che-Castaldo, C. [2 ]
Bull, P. J. [3 ]
Lipstein, G. [3 ]
Ravia, A. [4 ,5 ]
Carrion, B. [6 ]
Bolton, T. [7 ]
Ganguly, A. [8 ]
Lynch, H. J. [2 ,9 ]
机构
[1] Black Bawks Data Sci Ltd, 24 Abertarff Pl, Ft Augustus PH32 4DR, Scotland
[2] SUNY Stony Brook, Dept Ecol & Evolut, Stony Brook, NY 11794 USA
[3] DrivenData Inc, Denver, CO USA
[4] Weizmann Inst Sci, Dept Neurobiol, Rehovot, Israel
[5] Weizmann Inst Sci, Dept Comp Sci & Appl Math, Rehovot, Israel
[6] PRDW Consulting Port & Coastal Engn, Santiago, Chile
[7] Univ Oxford, Dept Atmospher Ocean & Planetary Phys, Oxford, England
[8] Sect 3, Kolkata, India
[9] SUNY Stony Brook, Inst Adv Computat Sci, Stony Brook, NY 11794 USA
基金
美国国家航空航天局; 美国国家科学基金会;
关键词
Penguins; Antarctica; MAPPPD; Forecasting; Uncertainty; ADELIE PENGUIN POPULATION; SEA-ICE EXTENT; ENVIRONMENTAL VARIABILITY; RANDOM FORESTS; CLASSIFICATION; SURVIVAL; REVEALS; WEATHER; MODELS; KRILL;
D O I
10.1016/j.ecoinf.2018.07.004
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Population forecasting, in which past dynamics are used to make predictions of future state, has many real-world applications. While time series of animal abundance are often modeled in ways that aim to capture the underlying biological processes involved, doing so is neither necessary nor sufficient for making good predictions. Here we report on a data science competition focused on modelling time series of Antarctic penguin abundance. We describe the best performing submitted models and compare them to a Bayesian model previously developed by domain experts and build an ensemble model that outperforms the individual component models in prediction accuracy. The top performing models varied tremendously in model complexity, ranging from very simple forward extrapolations of average growth rate to ensembles of models integrating recently developed machine learning techniques. Despite the short time frame for the competition, four of the submitted models outperformed the model previously created by the team of domain experts. We discuss the structure of the best performing models and components therein that might be useful for other ecological applications, the benefit of creating ensembles of models for ecological prediction, and the costs and benefits of including detailed domain expertise in ecological modelling. Additionally, we discuss the benefits of data science competitions, among which are increased visibility for challenging science questions, the generation of new techniques not yet adopted within the ecological community, and the ability to generate ensemble model forecasts that directly address model uncertainty.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 64 条
  • [1] Ainley D.G., 2002, ADELIE PENGUIN BELLW, DOI [DOI 10.7312/AINL12306, 10.7312/ainl12306]
  • [2] Antarctic penguin response to habitat change as Earth's troposphere reaches 2°C above preindustrial levels
    Ainley, David
    Russell, Joellen
    Jenouvrier, Stephanie
    Woehler, Eric
    Lyver, Philip O'B
    Fraser, William R.
    Kooyman, Gerald L.
    [J]. ECOLOGICAL MONOGRAPHS, 2010, 80 (01) : 49 - 66
  • [3] FITTING AUTOREGRESSIVE MODELS FOR PREDICTION
    AKAIKE, H
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1969, 21 (02) : 243 - &
  • [4] [Anonymous], MATH APPL
  • [5] Nonlinear effects of winter sea ice on the survival probabilities of Adelie penguins
    Ballerini, Tosca
    Tavecchia, Giacomo
    Olmastroni, Silvia
    Pezzo, Francesco
    Focardi, Silvano
    [J]. OECOLOGIA, 2009, 161 (02) : 253 - 265
  • [6] A gradient boosting approach to the Kaggle load forecasting competition
    Ben Taieb, Souhaib
    Hyndman, Rob J.
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2014, 30 (02) : 382 - 394
  • [7] Penguins as marine sentinels
    Boersma, P. Dee
    [J]. BIOSCIENCE, 2008, 58 (07) : 597 - 607
  • [8] Bowerman B.L., 2004, FORECASTING TIME SER
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Bull P., 2016, P ICML WORKSH DATA4G, P31