Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition

被引:0
作者
Liu, Hongda [1 ,2 ]
Wang, Yunlong [1 ]
Ren, Min [4 ]
Hu, Junxing [1 ,3 ]
Luo, Zhengquan [1 ,5 ]
Hou, Guangqi [1 ]
Sun, Zhenan [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, New Lab Pattern Recognit NLPR, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100190, Peoples R China
[4] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[5] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; skeleton sequence; long-tailed visual recognition; imbalance learning;
D O I
10.1007/s11633-023-1487-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Skeleton-based action recognition has recently made significant progress. However, data imbalance is still a great challenge in real-world scenarios. The performance of current action recognition algorithms declines sharply when training data suffers from heavy class imbalance. The imbalanced data actually degrades the representations learned by these methods and becomes the bottleneck for action recognition. How to learn unbiased representations from imbalanced action data is the key to long-tailed action recognition. In this paper, we propose a novel balanced representation learning method to address the long-tailed problem in action recognition. Firstly, a spatial-temporal action exploration strategy is presented to expand the sample space effectively, generating more valuable samples in a rebalanced manner. Secondly, we design a detached action-aware learning schedule to further mitigate the bias in the representation space. The schedule detaches the representation learning of tail classes from training and proposes an action-aware loss to impose more effective constraints. Additionally, a skip-type representation is proposed to provide complementary structural information. The proposed method is validated on four skeleton datasets, NTU RGB+D 60, NTU RGB+D 120, NW-UCLA and Kinetics. It not only achieves consistently large improvement compared to the state-of-the-art (SOTA) methods, but also demonstrates a superior generalization capacity through extensive experiments. Our code is available at https://github.com/firework8/BRL.
引用
收藏
页码:466 / 483
页数:18
相关论文
共 61 条
[1]   Long-Tailed Instance Segmentation Using Gumbel Optimized Loss [J].
Alexandridis, Konstantinos Panagiotis ;
Deng, Jiankang ;
Nguyen, Anh ;
Luo, Shan .
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 :353-369
[2]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[3]  
Cao KD, 2019, ADV NEUR IN, V32
[4]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[5]   Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].
Chen, Tailin ;
Zhou, Desen ;
Wang, Jian ;
Wang, Shidong ;
Guan, Yu ;
He, Xuming ;
Ding, Errui .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342
[6]   Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].
Chen, Yuxin ;
Zhang, Ziqi ;
Yuan, Chunfeng ;
Li, Bing ;
Deng, Ying ;
Hu, Weiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348
[7]  
Chen Z., 2022, CONTRASTIVE LEARNING
[8]   Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition [J].
Cheng, Ke ;
Zhang, Yifan ;
Cao, Congqi ;
Shi, Lei ;
Cheng, Jian ;
Lu, Hanqing .
COMPUTER VISION - ECCV 2020, PT XXIV, 2020, 12369 :536-553
[9]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[10]   InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].
Chi, Hyung-gun ;
Ha, Myoung Hoon ;
Chi, Seunggeun ;
Lee, Sang Wan ;
Huang, Qixing ;
Ramani, Karthik .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164