Spatial-Temporal Knowledge Integration: Robust Self-Supervised Facial Landmark Tracking

被引：5

作者：

Zhu, Congcong ^{[1
]}

Li, Xiaoqiang ^{[1
,2
]}

Li, Jide ^{[1
]}

Ding, Guangtai ^{[1
]}

Tong, Weiqin ^{[1
,2
]}

机构：

[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China

[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

Face Tracking; Self-supervised Learning; Knowledge Distillation; FACE ALIGNMENT;

D O I：

10.1145/3394171.3413993

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diversity of training data significantly affects tracking robustness of model under unconstrained environments. However, existing labeled datasets for facial landmark tracking tend to be large but not diverse, and manually annotating the massive clips of new diverse videos is extremely expensive. To address these problems, we propose a Spatial-Temporal Knowledge Integration (STKI) approach. Unlike most existing methods which rely heavily on labeled data, STKI exploits supervisions from unlabeled data. Specifically, STKI integrates spatial-temporal knowledge from massive unlabeled videos, which has several orders of magnitude more than existing labeled video data on the diversity, for robust tracking. Our framework includes a self-supervised tracker and an image-based detector for tracking initialization. To avoid the distortion of facial shape, the tracker leverages adversarial learning to introduce facial structure prior and temporal knowledge into cycle-consistency tracking. Meanwhile, we design a graph-based knowledge distillation method, which distills the knowledge from tracking and detection results, to improve the generalization of the detector. The fine-tuned detector can provide tracker on unconstrained videos with high-quality tracking initialization. Extensive experimental results show that the proposed method achieves state-of-the-art performance on comprehensive evaluation datasets.

引用

页码：4135 / 4143

页数：9

共 50 条

[21] SSAT: Self-Supervised Associating Network for Multiobject Tracking
Chung, Tae-Young
Cho, MyeongAh
Lee, Heansung
Lee, Sangyoun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7858 - 7868
[22] ViewMix: Augmentation for Robust Representation in Self-Supervised Learning
Das, Arjon
Zhong, Xin
IEEE ACCESS, 2024, 12 : 8461 - 8470
[23] Self-supervised Learning for Robust Surface Defect Detection
Aqeel, Muhammad
Sharifi, Shakiba
Cristani, Marco
Setti, Francesco
DEEP LEARNING THEORY AND APPLICATIONS, PT II, DELTA 2024, 2024, 2172 : 164 - 177
[24] Self-supervised attention flow for dialogue state tracking
Pan, Boyuan
Yang, Yazheng
Li, Bo
Cai, Deng
NEUROCOMPUTING, 2021, 440 : 279 - 286
[25] Robust Inverse Framework using Knowledge-guided Self-Supervised Learning: An application to Hydrology
Ghosh, Rahul
Renganathan, Arvind
Tayal, Kshitij
Li, Xiang
Khandelwal, Ankush
Jia, Xiaowei
Duffy, Christopher
Nieber, John
Kumar, Vipin
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 465 - 474
[26] Self-supervised temporal autoencoder for egocentric action segmentation
Zhang, Mingming
Liu, Dong
Hu, Shizhe
Yan, Xiaoqiang
Sun, Zhongchuan
Ye, Yangdong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[27] A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
Chi, Qiang
Lv, Guohua
Zhao, Guixin
Dong, Xiangjun
REMOTE SENSING, 2022, 14 (18)
[28] Image quality assessment based on self-supervised learning and knowledge distillation
Sang, Qingbing
Shu, Ziru
Liu, Lixiong
Hu, Cong
Wu, Qin
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90
[29] Occluded Facial Expression Recognition Using Self-supervised Learning
Wang, Jiahe
Ding, Heyan
Wang, Shangfei
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 121 - 136
[30] Self-supervised extracted contrast network for facial expression recognition
Yan L.
Yang J.
Xia J.
Gao R.
Zhang L.
Wan J.
Tang Y.
Multimedia Tools and Applications, 2025, 84 (15) : 14977 - 14996

← 1 2 3 4 5 →