Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes

被引:9
作者
Dang, Min [1 ,3 ]
Liu, Gang [1 ,2 ,3 ]
Li, Hao [1 ,3 ]
Xu, Qijie [1 ,3 ]
Wang, Xu [1 ,3 ]
Pan, Rong [1 ,3 ]
机构
[1] Xidian Univ, Sch Life Sci & Technol, 266,Xinglong Sect,Xifeng Rd, Xian 710126, Shaanxi, Peoples R China
[2] Xidian Univ, Guangzhou Inst Technol, 83 Zhiming, Guangzhou 510555, Guangdong, Peoples R China
[3] 266,Xinglong Sect,Xifeng Rd, Xian 710126, Shaanxi, Peoples R China
关键词
Classroom scene; Behaviour recognition; Object detection; Image classification; Vision transformer;
D O I
10.1007/s10489-024-05409-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For multi-object behaviour recognition in classroom scenes, crowded objects have heavy occlusion, invisible keypoints, scale variation, which directly overwhelms the recognition performance. Due to the dense student objects and similar student behaviours, multi-object behaviour recognition brings great challenges. Therefore, we proposed multi-object behaviour recognition based on object detection cascaded image classification. Specifically, object detection extracts student objects, followed by Vision Transformer (ViT) classification of student behaviour. To ensure the accuracy of behaviour recognition, it is first necessary to improve the detection performance of object detection. This paper proposes the Shallow Auxiliary Module for object detection to assist the backbone network in extracting hybrid multi-scale feature information. The multi-scale and multi-channel feature information is fused to alleviate object overlap and scale variation. We propose a Scale Assignment Fusion Mechanism that non-heuristically guides objects to learn the optimal feature layer. Furthermore, the Anchor-free Dynamic Label Assignment can suppress the prediction of low-quality bounding boxes, stabling training and improving detection performance. The proposed student object detector achieves the state-of-the-art mAP50\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{50}$$\end{document} of 88.03 and APl\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_l$$\end{document} of 57.64, outperforming state-of-the-art object detection methods. Our multi-object behaviour recognition method achieves the recognition of four behaviour classes, which is significantly better than the results of other comparison methods.
引用
收藏
页码:4935 / 4951
页数:17
相关论文
共 49 条
[1]   Single-shot 3D multi-person pose estimation in complex images [J].
Benzine, Abdallah ;
Luvison, Bertrand ;
Pham, Quoc Cuong ;
Achard, Catherine .
PATTERN RECOGNITION, 2021, 112
[2]   The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification [J].
Chang, Dongliang ;
Ding, Yifeng ;
Xie, Jiyang ;
Bhunia, Ayan Kumar ;
Li, Xiaoxu ;
Ma, Zhanyu ;
Wu, Ming ;
Guo, Jun ;
Song, Yi-Zhe .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4683-4695
[3]   Automatic segmentation of thyroid with the assistance of the devised boundary improvement based on multicomponent small dataset [J].
Chen, Yifei ;
Zhang, Xin ;
Li, Dandan ;
Park, HyunWook ;
Li, Xinran ;
Liu, Peng ;
Jin, Jing ;
Shen, Yi .
APPLIED INTELLIGENCE, 2023, 53 (16) :19708-19723
[4]   STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos [J].
Chen, Zheng ;
Liang, Meiyu ;
Xue, Zhe ;
Yu, Wanying .
APPLIED INTELLIGENCE, 2023, 53 (21) :25310-25329
[5]   HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation [J].
Cheng, Bowen ;
Xiao, Bin ;
Wang, Jingdong ;
Shi, Honghui ;
Huang, Thomas S. ;
Zhang, Lei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5385-5394
[6]   Anchor-Free Oriented Proposal Generator for Object Detection [J].
Cheng, Gong ;
Wang, Jiabao ;
Li, Ke ;
Xie, Xingxing ;
Lang, Chunbo ;
Yao, Yanqing ;
Han, Junwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[7]   Multi-object behavior recognition based on object detection for dense crowds [J].
Dang, Min ;
Liu, Gang ;
Xu, Qijie ;
Li, Ke ;
Wang, Di ;
He, Lihuo .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
[8]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761
[9]   Monocular multi-person pose estimation: A survey [J].
dos Reis, Eduardo Souza ;
Seewald, Lucas Adams ;
Antunes, Rodolfo Stoffel ;
Rodrigues, Vinicius Facco ;
Righi, Rodrigo da Rosa ;
da Costa, Cristiano Andre ;
da Silveira Jr, Luiz Gonzaga ;
Eskofier, Bjoern ;
Maier, Andreas ;
Horz, Tim ;
Fahrig, Rebecca .
PATTERN RECOGNITION, 2021, 118
[10]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338