A review of supervised learning methods for classifying animal behavioural states from environmental features

被引:9
作者
Bergen, Silas [1 ]
Huso, Manuela M. [2 ,3 ]
Duerr, Adam E. [4 ,5 ,6 ]
Braham, Melissa A. [6 ]
Schmuecker, Sara [7 ]
Miller, Tricia A. [5 ,6 ]
Katzner, Todd E. [8 ]
机构
[1] Winona State Univ, Dept Math & Stat, Winona, MN 55987 USA
[2] US Geol Survey, Forest & Rangeland Ecosyst Sci Ctr, Corvallis, OR USA
[3] Oregon State Univ, Stat Dept, Corvallis, OR 97331 USA
[4] Bloom Res Inc, Los Angeles, CA USA
[5] West Virginia Univ, Morgantown, WV 26506 USA
[6] Conservat Sci Global Inc, West Cape May, NJ USA
[7] US Fish & Wildlife Serv, Illinois Iowa Field Off, Moline, IL USA
[8] US Geol Survey, Forest & Rangeland Ecosyst Sci Ctr, Boise, ID USA
来源
METHODS IN ECOLOGY AND EVOLUTION | 2023年 / 14卷 / 01期
关键词
behavioural classification; boosted classification tree; neural networks; random forest; supervised learning; weighted k-nearest neighbour; XGBoost;
D O I
10.1111/2041-210X.14019
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Accurately predicting behavioural modes of animals in response to environmental features is important for ecology and conservation. Supervised learning (SL) methods are increasingly common in animal movement ecology for classifying behavioural modes. However, few examples exist of applying SL to classify polytomous animal behaviour from environmental features especially in the context of millions of animal observations. We review SL methods (weighted k-nearest neighbours; neural nets; random forests; and boosted classification trees with XGBoost) for classifying polytomous animal behaviour from environmental predictors. We also describe tuning parameter selection and assessment strategies, approaches for visualizing relationships between predictors and class outputs, and computational considerations. We demonstrate these methods by predicting three categories of risk to bald eagles from colliding with wind turbines using, as predictors, 12 environmental state features associated with 1.7 million GPS telemetry data points from 57 eagles. Of the SL methods we considered, XGBoost yielded the most accurate model with 86.2% classification accuracy and pairwise-averaged area under the ROC curve of 90.6. Computational time of XGBoost scaled better to large data than any other SL method. We also show how SHAP values integrated in the R package (xgboost) facilitate investigation of variable relationships and importance. For big data applications, XGBoost appears to provide superior classification accuracy and computational efficiency. Our results suggest XGBoost should be considered as an early modelling option in situations where the intent is to classify millions of animal behaviour observations from environmental predictors and to understand relationships between those predictors and movement behaviours. We also offer a tutorial to assist researchers in implementing this method.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 24 条
  • [1] Classifying behavior from short-interval biologging data: An example with GPS tracking of birds
    Bergen, Silas
    Huso, Manuela M.
    Duerr, Adam E.
    Braham, Melissa A.
    Katzner, Todd E.
    Schmuecker, Sara
    Miller, Tricia A.
    [J]. ECOLOGY AND EVOLUTION, 2022, 12 (02):
  • [2] Behaviour Classification on Giraffes (Giraffa camelopardalis) Using Machine Learning Algorithms on Triaxial Acceleration Data of Two Commonly Used GPS Devices and Its Possible Application for Their Management and Conservation
    Brandes, Stefanie
    Sicks, Florian
    Berger, Anne
    [J]. SENSORS, 2021, 21 (06)
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Predicting animal behaviour using deep learning: GPS data alone accurately predict diving in seabirds
    Browning, Ella
    Bolton, Mark
    Owen, Ellie
    Shoji, Akiko
    Guilford, Tim
    Freeman, Robin
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2018, 9 (03): : 681 - 692
  • [5] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [6] Using tri-axial accelerometer loggers to identify spawning behaviours of large pelagic fish
    Clarke, Thomas M.
    Whitmarsh, Sasha K.
    Hounslow, Jenna L.
    Gleiss, Adrian C.
    Payne, Nicholas L.
    Huveneers, Charlie
    [J]. MOVEMENT ECOLOGY, 2021, 9 (01)
  • [7] Doshi-Velez Finale, 2017, arXiv, DOI DOI 10.48550/ARXIV.1702.08608
  • [8] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [9] A simple generalisation of the area under the ROC curve for multiple class classification problems
    Hand, DJ
    Till, RJ
    [J]. MACHINE LEARNING, 2001, 45 (02) : 171 - 186
  • [10] Hastie T., 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, DOI [10.1007/978-0-387-84858-7, 10.1007/BF02985802, DOI 10.1007/978-0-387-84858-7]