Action Detection for Wildlife Monitoring with Camera Traps Based on Segmentation with Filtering of Tracklets (SWIFT) and Mask-Guided Action Recognition (MAROON)

被引:6
作者
Schindler, Frank [1 ]
Steinhage, Volker [1 ]
van Beeck Calkoen, Suzanne T. S. [2 ,3 ,4 ]
Heurich, Marco [2 ,5 ,6 ]
机构
[1] Univ Bonn, Dept Comp Sci 4, Friedrich Hirzebruch Allee 8, D-53115 Bonn, Germany
[2] Bavarian Forest Natl Pk, Dept Natl Pk Monitoring & Anim Management, Freyunger Str 2, D-94481 Grafenau, Germany
[3] Tech Univ Dresden, Inst Forest Bot & Forest Zool, Forest Zool, Pienner Str 7, D-01737 Tharandt, Germany
[4] Univ Goettingen, Fac Forest Sci & Forest Ecol, Wildlife Sci, Buesgenweg 3, D-37077 Gottingen, Germany
[5] Univ Freiburg, Fac Environm & Nat Resources, Tennenbacher Str 4, D-79106 Freiburg, Germany
[6] Inland Norway Univ Appl Sci, Inst Forestry & Wildlife Management, NO-2480 Koppang, Norway
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 02期
关键词
wildlife monitoring; deep learning; video instance segmentation; mask-supported action recognition; triple-stream convolutional neural network; action detection for deer; BEHAVIOR;
D O I
10.3390/app14020514
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected animals in camera trap videos. Our action detection approach is based on SWIFT (segmentation with filtering of tracklets), which we have already shown to successfully detect and track animals in wildlife videos, and MAROON (mask-guided action recognition), an action recognition network that we are introducing here. The basic ideas of MAROON are the exploitation of the instance masks detected by SWIFT and a triple-stream network. The instance masks enable more accurate action recognition, especially if multiple animals appear in a video at the same time. The triple-stream approach extracts features for the motion and appearance of the animal. We evaluate the quality of our action recognition on two self-generated datasets, from an animal enclosure and from the wild. These datasets contain videos of red deer, fallow deer and roe deer, recorded both during the day and night. MAROON improves the action recognition accuracy compared to other state-of-the-art approaches by an average of 10 percentage points on all analyzed datasets and achieves an accuracy of 69.16% on the Rolandseck Daylight dataset, in which 11 different action classes occur. Our action detection system makes it possible todrasticallyreduce the manual work of ecologists and at the same time gain new insights through standardized results.
引用
收藏
页数:17
相关论文
共 64 条
[1]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[2]   Studies of the Behavioral Sequences: The Neuroethological Morphology Concept Crossing Ethology and Functional Morphology [J].
Bels, Vincent L. ;
Pallandre, Jean-Pierre ;
Pelle, Eric ;
Kirchhoff, Florence .
ANIMALS, 2022, 12 (11)
[3]   Integrating animal behavior and conservation biology: a conceptual framework [J].
Berger-Tal, Oded ;
Polak, Tal ;
Oron, Aya ;
Lubin, Yael ;
Kotler, Burt P. ;
Saltz, David .
BEHAVIORAL ECOLOGY, 2011, 22 (02) :236-239
[4]  
Bhoi A., 2019, Spatio-temporal Action Recognition: A Survey
[5]  
Biswas S., 2020, P ASIAN C COMPUTER V
[6]  
Brookes O, 2023, Arxiv, DOI [arXiv:2301.02642, DOI 10.48550/ARXIV.2301.02642, 10.48550/arXiv.2301.02642]
[7]   A review of camera trapping for conservation behaviour research [J].
Caravaggi, Anthony ;
Banks, Peter B. ;
Burton, A. Cole ;
Finlay, Caroline M. V. ;
Haswell, Peter M. ;
Hayward, Matt W. ;
Rowcliffe, Marcus J. ;
Wood, Mike D. .
REMOTE SENSING IN ECOLOGY AND CONSERVATION, 2017, 3 (03) :109-122
[8]   An invasive-native mammalian species replacement process captured by camera trap survey random encounter models [J].
Caravaggi, Anthony ;
Zaccaroni, Marco ;
Riga, Francesco ;
Schai-Braun, Stephanie C. ;
Dick, Jaimie T. A. ;
Montgomery, W. Ian ;
Reid, Neil .
REMOTE SENSING IN ECOLOGY AND CONSERVATION, 2016, 2 (01) :45-58
[9]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[10]   ANALYSING SEQUENCES OF BEHAVIOURAL EVENTS [J].
CHATFIELD, C ;
LEMON, RE .
JOURNAL OF THEORETICAL BIOLOGY, 1970, 29 (03) :427-+