EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment

被引：11

作者：

Sharma, Vijeta ^{[1
,2
]}

Gupta, Manjari ^{[2
]}

Kumar, Ajai ^{[1
]}

Mishra, Deepti ^{[3
]}

机构：

[1] Ctr Dev Adv Comp C DAC, Pune 411008, Maharashtra, India

[2] Banaras Hindu Univ, DST Ctr Interdisciplinary Math Sci, Inst Sci, Varanasi 221005, Uttar Pradesh, India

[3] NTNU Norwegian Univ Sci & Technol, Dept Comp Sci IDI, N-2815 Gjovik, Norway

来源：

SENSORS | 2021年 / 21卷 / 17期

关键词：

artificial intelligence; classroom activity recognition; classroom monitoring; EduNet dataset; education;

D O I：

10.3390/s21175699

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of our knowledge, EduNet is the first dataset specially prepared for classroom monitoring for both teacher and student activities. It is also a challenging dataset of actions as it has many clips (and due to the unconstrained nature of the clips). We compared the performance of the EduNet dataset with benchmark video datasets UCF101 and HMDB51 on a standard I3D-ResNet-50 model, which resulted in 72.3% accuracy. The development of a new benchmark dataset for the education domain will benefit future research concerning classroom monitoring systems. The EduNet dataset is a collection of classroom activities from 1 to 12 standard schools.

引用

页数：18

共 43 条

[11] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[12] Convolutional Two-Stream Network Fusion for Video Action Recognition [J].

Feichtenhofer, Christoph ;

Pinz, Axel ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941

[13] A simple teacher behavior recognition method for massive teaching videos based on teacher set [J].

Gang, Zhao ;

Wenjuan, Zhu ;

Biling, Hu ;

Jie, Chu ;

Hui, He ;

Qing, Xia .

APPLIED INTELLIGENCE, 2021, 51 (12) :8828-8849

[14] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting [J].

Ghandoura, Abdulkader ;

Hjabo, Farouk ;

Al Dakkak, Oumayma .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102

[15] The "something something" video database for learning and evaluating visual common sense [J].

Goyal, Raghav ;

Kahou, Samira Ebrahimi ;

Michalski, Vincent ;

Materzynska, Joanna ;

Westphal, Susanne ;

Kim, Heuna ;

Haenel, Valentin ;

Fruend, Ingo ;

Yianilos, Peter ;

Mueller-Freitag, Moritz ;

Hoppe, Florian ;

Thurau, Christian ;

Bax, Ingo ;

Memisevic, Roland .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851

[16] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions [J].

Gu, Chunhui ;

Sun, Chen ;

Ross, David A. ;

Vondrick, Carl ;

Pantofaru, Caroline ;

Li, Yeqing ;

Vijayanarasimhan, Sudheendra ;

Toderici, George ;

Ricco, Susanna ;

Sukthankar, Rahul ;

Schmid, Cordelia ;

Malik, Jitendra .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6047-6056

[17] A novel Gait-Appearance-based Multi-Scale Video Covariance Approach for pedestrian (re)-identification [J].

Hadjkacem, Bassem ;

Ayedi, Walid ;

Ben Ayed, Mossaad ;

Alshaya, Shaya A. ;

Abid, Mohamed .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 91

[18] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[19] Towards understanding action recognition [J].

Jhuang, Hueihan ;

Gall, Juergen ;

Zuffi, Silvia ;

Schmid, Cordelia ;

Black, Michael J. .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3192-3199

[20] 3D Convolutional Neural Networks for Human Action Recognition [J].

Ji, Shuiwang ;

Xu, Wei ;

Yang, Ming ;

Yu, Kai .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231

← 1 2 3 4 5 →