A Two-stream Network with Spatial Long-range Modeling for Weakly-supervised Temporal Action Localization

被引:0
作者
Bu, Aojie [1 ]
Zhang, Han [1 ]
Li, Jun [1 ]
Shi, Zhiping [1 ]
机构
[1] Capital Normal Univ, Informat Engn Coll, Beijing, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS | 2024年
关键词
Weak supervision; Temporal action localization; Spatial attention;
D O I
10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics60724.2023.00043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on weakly supervised temporal action localization, a critical issue within the field of Artificial Intelligence (AI) in the context of the Internet of Things (IoT). The objective is to identify and locate action segments within untrimmed videos, which are solely trained using video-level action labels. Current visual perception and analysis techniques in the IoT face substantial challenges, including a significant amount of noise in videos, the complexity of backgrounds, and the absence of clear motion boundaries in lengthy video sequences. In this paper, we fully recognize that each frame space contains a wealth of information that contributes to classification and localization tasks. Therefore, we propose a method based on modeling features using spatial information within long video frames, aiming to more accurately locate action boundaries and achieve perception of complex data within IoT environments. Additionally, we employ traditional features within our model to alleviate noise issues typically encountered in lengthy videos. Traditional and deep features are processed separately, constituting a two-stream network within the entire model. We conducted comprehensive experiments on the widely used THUMOS14 dataset, demonstrating significant improvements compared to state-of-the-art methods. Furthermore, we conducted experiments on the ActivityNet v1.2 dataset, where traditional features were omitted. The experiments revealed that even without traditional features, our module and two-stream network strategy remained effective within IoT environments.
引用
收藏
页码:115 / 120
页数:6
相关论文
共 22 条
[1]  
Buch Shyamal, 2019, P BRIT MACH VIS C 20
[2]   Dual-Evidential Learning for Weakly-supervised Temporal Action Localization [J].
Chen, Mengyuan ;
Gao, Junyu ;
Yang, Shicai ;
Xu, Changsheng .
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 :192-208
[3]   Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization [J].
Gao, Junyu ;
Chen, Mengyuan ;
Xu, Changsheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19967-19977
[4]   Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation [J].
Huang, Linjiang ;
Wang, Liang ;
Li, Hongsheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3262-3271
[5]   Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization [J].
Huang, Linjiang ;
Wang, Liang ;
Li, Hongsheng .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7982-7991
[6]  
Islam A, 2021, AAAI CONF ARTIF INTE, V35, P1637
[7]  
Islam A, 2020, IEEE WINT CONF APPL, P536, DOI [10.1109/WACV45572.2020.9093620, 10.1109/wacv45572.2020.9093620]
[8]   ActionBytes: Learning from Trimmed Videos to Localize Actions [J].
Jain, Mihir ;
Ghodrati, Amir ;
Snoek, Cees G. M. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1168-1177
[9]  
Lee P, 2020, AAAI CONF ARTIF INTE, V34, P11320
[10]   Boosting Weakly-Supervised Temporal Action Localization with Text Information [J].
Li, Guozhang ;
Cheng, De ;
Ding, Xinpeng ;
Wang, Nannan ;
Wang, Xiaoyu ;
Gao, Xinbo .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10648-10657