AGENT-ENVIRONMENT NETWORK FOR TEMPORAL ACTION PROPOSAL GENERATION

被引:11
作者
Viet-Khoa Vo-Ho [1 ,2 ,3 ]
Le, Ngan [3 ]
Kamazaki, Kashu [3 ]
Sugimoto, Akihiro [4 ]
Minh-Triet Tran [1 ,2 ]
机构
[1] Univ Sci, Fac Informat Technol, VNU HCM, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
[3] Univ Arkansas, Dept Comp Sci, Fayetteville, AR 72701 USA
[4] Natl Inst Informat, Tokyo, Japan
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Action Proposal Generation; Contextual Agent-Environment Network;
D O I
10.1109/ICASSP39728.2021.9415101
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos. Most of existing approaches are unable to follow the human cognitive process of understanding the video context due to lack of attention mechanism to express the concept of an action or an agent who performs the action or the interaction between the agent and the environment. Based on the action definition that a human, known as an agent, interacts with the environment and performs an action that affects the environment, we propose a contextual Agent-Environment Network. Our proposed contextual AEN involves (i) agent pathway, operating at a local level to tell about which humans/agents are acting and (ii) environment pathway operating at a global level to tell about how the agents interact with the environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks, i.e C3D and SlowFast, show that our method robustly exhibits outperformance against state-of-the-art methods regardless of the employed backbone network.
引用
收藏
页码:2160 / 2164
页数:5
相关论文
共 34 条
[1]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[2]   Soft-NMS - Improving Object Detection With One Line of Code [J].
Bodla, Navaneeth ;
Singh, Bharat ;
Chellappa, Rama ;
Davis, Larry S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570
[3]   SST: Single-Stream Temporal Action Proposals [J].
Buch, Shyamal ;
Escorcia, Victor ;
Shen, Chuanqi ;
Ghanem, Bernard ;
Niebles, Juan Carlos .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382
[4]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[5]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[6]   Temporal Context Network for Activity Localization in Videos [J].
Dai, Xiyang ;
Singh, Bharat ;
Zhang, Guyue ;
Davis, Larry S. ;
Chen, Yan Qiu .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5727-5736
[7]  
Eun H., 2019, IEEE T CIRC SYST VID, P1
[8]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[9]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[10]   CTAP: Complementary Temporal Action Proposal Generation [J].
Gao, Jiyang ;
Chen, Kan ;
Nevatia, Ram .
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 :70-85