Spatio-Temporal Attention Networks for Action Recognition and Detection
被引:117
|
作者:
Li, Jun
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R ChinaBeihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Li, Jun
[1
]
Liu, Xianglong
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Beihang Univ, Beijing Adv Innovat Ctr Big Data Based Precis Med, Beijing 10000, Peoples R ChinaBeihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Liu, Xianglong
[1
,2
]
Zhang, Wenxuan
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R ChinaBeihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Zhang, Wenxuan
[1
]
Zhang, Mingyuan
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R ChinaBeihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Zhang, Mingyuan
[1
]
Song, Jingkuan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Elect Sci & Technol China, Innovat Ctr, Chengdu 610051, Peoples R ChinaBeihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Song, Jingkuan
[3
]
Sebe, Nicu
论文数: 0引用数: 0
h-index: 0
机构:
Univ Trento, Dept Informat Engn & Comp Sci, I-38122 Trento, ItalyBeihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
Sebe, Nicu
[4
]
机构:
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 10000, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data Based Precis Med, Beijing 10000, Peoples R China
[3] Univ Elect Sci & Technol China, Innovat Ctr, Chengdu 610051, Peoples R China
Recently, 3D Convolutional Neural Network (3D CNN) models have been widely studied for video sequences and achieved satisfying performance in action recognition and detection tasks. However, most of the existing 3D CNNs treat all input video frames equally, thus ignoring the spatial and temporal differences across the video frames. To address the problem, we propose a spatio-temporal attention (STA) network that is able to learn the discriminative feature representation for actions, by respectively characterizing the beneficial information at both the frame level and the channel level. By simultaneously exploiting the differences in spatial and temporal dimensions, our STA module enhances the learning capability of the 3D convolutions when handling the complex videos. The proposed STA method can be wrapped as a generic module easily plugged into the state-of-the-art 3D CNN architectures for video action detection and recognition. We extensively evaluate our method on action recognition and detection tasks over three popular datasets (UCF-101, HMDB-51 and THUMOS 2014), and the experimental results demonstrate that adding our STA network module can obtain the state-of-the-art performance on UCF-101 and HMDB-51, which has the top-1 accuracies of 98.4% and 81.4% respectively, and achieve significant improvement on THUMOS 2014 dataset compared against original models.
机构:
Univ Sci & Technol China, Hefei 230000, Anhui, Peoples R China
Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230000, Anhui, Peoples R ChinaUniv Sci & Technol China, Hefei 230000, Anhui, Peoples R China
Li, Dong
Yao, Ting
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Res, Multimedia Search & Mining Grp, Beijing 100080, Peoples R ChinaUniv Sci & Technol China, Hefei 230000, Anhui, Peoples R China
Yao, Ting
Duan, Ling-Yu
论文数: 0引用数: 0
h-index: 0
机构:
Peking Univ, Natl Engn Lab Video Technol, Sch Elect Engn & Comp Sci, Beijing 100080, Peoples R ChinaUniv Sci & Technol China, Hefei 230000, Anhui, Peoples R China
Duan, Ling-Yu
Mei, Tao
论文数: 0引用数: 0
h-index: 0
机构:
JD AI Res, Beijing 100101, Peoples R China
JD AI Res, Comp Vis & Multimedia Lab, Beijing 100101, Peoples R ChinaUniv Sci & Technol China, Hefei 230000, Anhui, Peoples R China
Mei, Tao
Rui, Yong
论文数: 0引用数: 0
h-index: 0
机构:
Lenovo, Beijing 100085, Peoples R ChinaUniv Sci & Technol China, Hefei 230000, Anhui, Peoples R China
机构:
Peking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R ChinaPeking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
Song, Sijie
Lan, Cuiling
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Res Asia, Beijing 100080, Peoples R ChinaPeking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
Lan, Cuiling
Xing, Junliang
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100080, Peoples R ChinaPeking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
Xing, Junliang
Zeng, Wenjun
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Res Asia, Beijing 100080, Peoples R China
Microsoft Res Asia, Senior Leadership Team, Beijing 100080, Peoples R ChinaPeking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
Zeng, Wenjun
Liu, Jiaying
论文数: 0引用数: 0
h-index: 0
机构:
Peking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R ChinaPeking Univ, Inst Comp Sci & Technol, Beijing 100080, Peoples R China
机构:
Kyungpook Natl Univ, Grad Sch Artificial Intelligence, Daegu 41566, South KoreaKyungpook Natl Univ, Grad Sch Artificial Intelligence, Daegu 41566, South Korea
Keisham, Kanchan
Jalali, Amin
论文数: 0引用数: 0
h-index: 0
机构:
Kyungpook Natl Univ, AI Inst Technol, KNU LG Elect Convergence Res Ctr, Daegu 41566, South KoreaKyungpook Natl Univ, Grad Sch Artificial Intelligence, Daegu 41566, South Korea