Domain learning joint with semantic adaptation for human action recognition

被引:15
作者
Zhang, Junxuan [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge adaptation; Two-stream network; Video representation; Action recognition; Cascaded convolution fusion strategy; REPRESENTATION; FEATURES;
D O I
10.1016/j.patcog.2019.01.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is a challenging task in the field of computer vision. The deficiency in training samples is a bottleneck problem in the current action recognition research. With the explosive growth of Internet data, some researchers try to use prior knowledge learned from various video sources to assist in recognizing the action video of the target domain, which is called knowledge adaptation. Based on this idea, we propose a novel framework for action recognition, called Semantic Adaptation based on the Vector of Locally Max Pooled deep learned Features (SA-VLMPF). The proposed framework consists of three parts: Two-Stream Fusion Network (TSFN), Vector of Locally Max-Pooled deep learned Features (VLMPF) and Semantic Adaptation Model (SAM). TSFN adopts a cascaded convolution fusion strategy to combine the convolutional features extracted from two-stream network. VLMPF retains the long-term information in videos and removes the irrelevant information by capturing multiple local features and extracting the features with the highest response to action category. SAM first maps the data of the auxiliary domain and the target domain into the high-level semantic representation through the deep network. Then the obtained high-level semantic representations from auxiliary domain are adapted into target domain in order to optimize the target classifier. Compared with the existing methods, the proposed methods can utilize the advantages of deep learning methods in obtaining the high-level semantic information to improve the performance of knowledge adaptation. At the same time, SA-VLMPF can make full use of the auxiliary data to make up for the insufficiency of training samples. Multiple experiments are conducted on several couples of datasets to validate the effectiveness of the proposed framework. The results show that the proposed SA-VLMPF outperforms the state-of-the-art knowledge adaptation methods. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:196 / 209
页数:14
相关论文
共 50 条
[21]   Human Action Recognition based on Spectral Domain Features [J].
Imtiaz, Hafiz ;
Mahbub, Upal ;
Schaefer, Gerald ;
Zhu, Shao Ying ;
Ahad, Md. Atiqur Rahman .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 :430-437
[22]   LEARNING SILHOUETTE DYNAMICS FOR HUMAN ACTION RECOGNITION [J].
Luo, Guan ;
Hu, Weiming .
2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, :2827-2831
[23]   Human action recognition by Grassmann manifold learning [J].
Rahimi, Sahere ;
Aghagolzadeh, Ali ;
Ezoji, Mehdi .
2015 9TH IRANIAN CONFERENCE ON MACHINE VISION AND IMAGE PROCESSING (MVIP), 2015, :61-64
[24]   Progressive semantic learning for unsupervised skeleton-based action recognition [J].
Qin, Hao ;
Chen, Luyuan ;
Kong, Ming ;
Zhao, Zhuoran ;
Zeng, Xianzhou ;
Lu, Mengxu ;
Zhu, Qiang .
MACHINE LEARNING, 2025, 114 (03)
[25]   Joint Deep Learning for RGB-D Action Recognition [J].
Qin, Xiaolei ;
Ge, Yongxin ;
Zhan, Liuwei ;
Li, Guangrui ;
Huang, Sheng ;
Wang, Hongxing ;
Chen, Feiyu ;
Wang, Hongxing .
2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
[26]   Learning correlations for human action recognition in videos [J].
Yi, Yun ;
Wang, Hanli ;
Zhang, Bowen .
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (18) :18891-18913
[27]   Learning Sparse Representations for Human Action Recognition [J].
Guha, Tanaya ;
Ward, Rabab Kreidieh .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (08) :1576-1588
[28]   Source-free Temporal Attentive Domain Adaptation for Video Action Recognition [J].
Chen, Peipeng ;
Ma, Andy J. .
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, :489-497
[29]   Domain adaptation with optimized feature distribution for streamer action recognition in live video [J].
He, Chen ;
Zhang, Jing ;
Chen, Lin ;
Zhang, Hui ;
Zhuo, Li .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (01) :107-125
[30]   Analyzing Role of Joint Subset Selection in Human Action Recognition [J].
Dinh-Tan Pham ;
Tien-Nam Nguyen ;
Thi-Lan Le ;
Hai Vu .
PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, :61-66