Domain learning joint with semantic adaptation for human action recognition

被引:15
作者
Zhang, Junxuan [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge adaptation; Two-stream network; Video representation; Action recognition; Cascaded convolution fusion strategy; REPRESENTATION; FEATURES;
D O I
10.1016/j.patcog.2019.01.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is a challenging task in the field of computer vision. The deficiency in training samples is a bottleneck problem in the current action recognition research. With the explosive growth of Internet data, some researchers try to use prior knowledge learned from various video sources to assist in recognizing the action video of the target domain, which is called knowledge adaptation. Based on this idea, we propose a novel framework for action recognition, called Semantic Adaptation based on the Vector of Locally Max Pooled deep learned Features (SA-VLMPF). The proposed framework consists of three parts: Two-Stream Fusion Network (TSFN), Vector of Locally Max-Pooled deep learned Features (VLMPF) and Semantic Adaptation Model (SAM). TSFN adopts a cascaded convolution fusion strategy to combine the convolutional features extracted from two-stream network. VLMPF retains the long-term information in videos and removes the irrelevant information by capturing multiple local features and extracting the features with the highest response to action category. SAM first maps the data of the auxiliary domain and the target domain into the high-level semantic representation through the deep network. Then the obtained high-level semantic representations from auxiliary domain are adapted into target domain in order to optimize the target classifier. Compared with the existing methods, the proposed methods can utilize the advantages of deep learning methods in obtaining the high-level semantic information to improve the performance of knowledge adaptation. At the same time, SA-VLMPF can make full use of the auxiliary data to make up for the insufficiency of training samples. Multiple experiments are conducted on several couples of datasets to validate the effectiveness of the proposed framework. The results show that the proposed SA-VLMPF outperforms the state-of-the-art knowledge adaptation methods. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:196 / 209
页数:14
相关论文
共 50 条
[41]   Cross-view action recognition by cross-domain learning [J].
Nie, Weizhi ;
Liu, Anan ;
Li, Wenhui ;
Su, Yuting .
IMAGE AND VISION COMPUTING, 2016, 55 :109-118
[42]   Multi-View Action Recognition by Cross-domain Learning [J].
Nie, Weizhi ;
Liu, Anan ;
Yu, Jing ;
Su, Yuting ;
Chaisorn, Lekha ;
Wang, Yongkang ;
Kankanhalli, Mohan S. .
2014 IEEE 16TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2014,
[43]   Semantic Pyramids for Gender and Action Recognition [J].
Khan, Fahad Shahbaz ;
van de Weijer, Joost ;
Anwer, Rao Muhammad ;
Felsberg, Michael ;
Gatta, Carlo .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (08) :3633-3645
[44]   Weakly Semantic Guided Action Recognition [J].
Yu, Tingzhao ;
Wang, Lingfeng ;
Da, Cheng ;
Gu, Huxiang ;
Xiang, Shiming ;
Pan, Chunhong .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) :2504-2517
[45]   SAST: Learning Semantic Action-Aware Spatial-Temporal Features for Efficient Action Recognition [J].
Wang, Fei ;
Wang, Guorui ;
Huang, Yunwen ;
Chu, Hao .
IEEE ACCESS, 2019, 7 :164876-164886
[46]   Semantic-guided multi-scale human skeleton action recognition [J].
Qi, Yongfeng ;
Hu, Jinlin ;
Zhuang, Liqiang ;
Pei, Xiaoxu .
APPLIED INTELLIGENCE, 2023, 53 (09) :9763-9778
[47]   Semantic-guided multi-scale human skeleton action recognition [J].
Yongfeng Qi ;
Jinlin Hu ;
Liqiang Zhuang ;
Xiaoxu Pei .
Applied Intelligence, 2023, 53 :9763-9778
[48]   Joint spatial-temporal attention for action recognition [J].
Yu, Tingzhao ;
Guo, Chaoxu ;
Wang, Lingfeng ;
Gu, Huxiang ;
Xiang, Shiming ;
Pan, Chunhong .
PATTERN RECOGNITION LETTERS, 2018, 112 :226-233
[49]   Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition [J].
Fu, Jie ;
Gao, Junyu ;
Xu, Changsheng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) :5213-5224
[50]   Human Action Recognition With Video Data: Research and Evaluation Challenges [J].
Ramanathan, Manoj ;
Yau, Wei-Yun ;
Teoh, Eam Khwang .
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2014, 44 (05) :650-663