Adaptive Fusion and Category-Level Dictionary Learning Model for Multiview Human Action Recognition

被引:153
作者
Gao, Zan [1 ]
Xuan, Hai-Zhen [2 ,3 ]
Zhang, Hua [2 ,3 ]
Wan, Shaohua [4 ]
Choo, Kim-Kwang Raymond [5 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Shandong Artif Intelligence Inst,Natl Supercomp C, Jinan 250014, Shandong, Peoples R China
[2] Tianjin Univ Technol, Minist Educ, Key Lab Comp Vis & Syst, Tianjin 300384, Peoples R China
[3] Tianjin Univ Technol, Tianjin Key Lab Intelligence Comp & Novel Softwar, Tianjin 300384, Peoples R China
[4] Zhongnan Univ Econ & Law, Sch Informat & Safety Engn, Wuhan 430073, Hubei, Peoples R China
[5] Univ Texas San Antonio, Dept Informat Syst & Cyber Secur, San Antonio, TX 78249 USA
基金
中国国家自然科学基金;
关键词
Adaptive fusion; category-level dictionary learning; improved dense trajectory (iDT); induced set; multiview action recognition; ALTERNATING MINIMIZATION; FACE RECOGNITION; K-SVD; SPARSITY; DISTANCE;
D O I
10.1109/JIOT.2019.2911669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human actions are often captured by multiple cameras (or sensors) to overcome the significant variations in viewpoints, background clutter, object speed, and motion patterns in video surveillance, and action recognition systems often benefit from fusing multiple types of cameras (sensors). Therefore, adaptive fusion of the information from multiple domains is mandatory for multiview human action recognition. Two widely applied fusion schemes are feature-level fusion and score-level fusion. We point out that limitations still exist and there is tremendous room for improvement, including the separate computation of feature fusion and action recognition, or the fixed weights for each action and each camera. However, previous fusion methods cannot accomplish them. In this paper, inspired by nature, the above limitations are addressed for multiview action recognition by developing a novel adaptive fusion and category-level dictionary learning model (abbreviated to AFCDL). It can jointly learn the adaptive weight for each camera and optimize the reconstruction of samples toward the action recognition task. To induce the dictionary learning and the reconstruction of query set (or test samples), the induced set for each category is built, and the corresponding induced regularization term is designed for the objective function. Extensive experiments on four public multiview action benchmarks show that AFCDL can significantly outperforms the state-of-the-art methods with 3% to 10% improvement in recognition accuracy.
引用
收藏
页码:9280 / 9293
页数:14
相关论文
共 65 条
[1]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[2]  
[Anonymous], 2009, Tech. Rep. CMU-CS- 09-161
[3]   Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality [J].
Attouch, Hedy ;
Bolte, Jerome ;
Redont, Patrick ;
Soubeyran, Antoine .
MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (02) :438-457
[4]   Cross-Domain Human Action Recognition [J].
Bian, Wei ;
Tao, Dacheng ;
Rui, Yong .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (02) :298-307
[5]  
Cai Z., 2014, PROTEIN CELL, P1
[6]   Face Recognition Based on Image Sets [J].
Cevikalp, Hakan ;
Triggs, Bill .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2567-2573
[7]   MMALFM: Explainable Recommendation by Leveraging Reviews and Images [J].
Cheng, Zhiyong ;
Chang, Xiaojun ;
Zhu, Lei ;
Kanjirathinkal, Rose C. ;
Kankanhalli, Mohan .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (02)
[8]   On Effective Location-Aware Music Recommendation [J].
Cheng, Zhiyong ;
Shen, Jialie .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2016, 34 (02)
[9]  
Christoph R., 2016, Advances in neural information processing systems, P3476
[10]  
Crisp DJ, 2000, ADV NEUR IN, V12, P244