Pruning-guided feature distillation for an efficient transformer-based pose estimation model

被引：1

作者：

Kim, Dong-hwi ^{[1
]}

Lee, Dong-hun ^{[1
]}

Kim, Aro ^{[1
]}

Jeong, Jinwoo ^{[2
]}

Lee, Jong Taek ^{[1
]}

Kim, Sungjei ^{[2
]}

Park, Sang-hyo ^{[1
]}

机构：

[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea

[2] Korea Elect Technol Inst, Seongnam Si, Gyeonggi Do, South Korea

来源：

IET COMPUTER VISION | 2024年 / 18卷 / 06期

关键词：

computational complexity; computer vision; learning (artificial intelligence); pose estimation;

D O I：

10.1049/cvi2.12277

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The authors propose a compression strategy for a 3D human pose estimation model based on a transformer which yields high accuracy but increases the model size. This approach involves a pruning-guided determination of the search range to achieve lightweight pose estimation under limited training time and to identify the optimal model size. In addition, the authors propose a transformer-based feature distillation (TFD) method, which efficiently exploits the pose estimation model in terms of both model size and accuracy by leveraging transformer architecture characteristics. Pruning-guided TFD is the first approach for 3D human pose estimation that employs transformer architecture. The proposed approach was tested on various extensive data sets, and the results show that it can reduce the model size by 30% compared to the state-of-the-art while ensuring high accuracy. The authors propose a transformer-based feature distillation (TFD) method that exploits the characteristics of transformer-based architecture to obtain a significantly efficient pose estimation model in view of model size and accuracy. To the best of the authors' knowledge, pruning-guided TFD is the first approach proposed for 3D human pose estimation that employs transformer architecture. The proposed approach was tested on various large data sets and the results show that it can reduce the model size by 30% compared to the state-of-the-art while ensuring high accuracy. image

引用

页码：745 / 758

页数：14

共 55 条

[1]

Adriana R., 2015, 3 INT C LEARNING REP

[2] Reconciling modern machine-learning practice and the classical bias-variance trade-off [J].

Belkin, Mikhail ;

Hsu, Daniel ;

Ma, Siyuan ;

Mandal, Soumik .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) :15849-15854

[3]

Bucila Cristian., 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464

[4] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].

Cai, Yujun ;

Ge, Liuhao ;

Liu, Jun ;

Cai, Jianfei ;

Cham, Tat-Jen ;

Yuan, Junsong ;

Thalmann, Nadia Magnenat .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281

[5] Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition [J].

Chen, Tianlang ;

Fang, Chen ;

Shen, Xiaohui ;

Zhu, Yiheng ;

Chen, Zhili ;

Luo, Jiebo .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) :198-209

[6] GFPose: Learning 3D Human Pose Prior with Gradient Fields [J].

Ci, Hai ;

Wu, Mingdong ;

Zhu, Wentao ;

Ma, Xiaoxuan ;

Dong, Hao ;

Zhong, Fangwei ;

Wang, Yizhou .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :4800-4810

[7] Optimizing Network Structure for 3D Human Pose Estimation [J].

Ci, Hai ;

Wang, Chunyu ;

Ma, Xiaoxuan ;

Wang, Yizhou .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2262-2271

[8]

Emily L.D., 2014, ADV NEURAL INF PROCE, V27

[9]

Geoffrey H., 2015, ADV NEURAL INFORM PR

[10] Knowledge Distillation: A Survey [J].

Gou, Jianping ;

Yu, Baosheng ;

Maybank, Stephen J. ;

Tao, Dacheng .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) :1789-1819

← 1 2 3 4 5 6 →