Spatial-Temporal Knowledge Integration: Robust Self-Supervised Facial Landmark Tracking

被引：5

作者：

Zhu, Congcong ^{[1
]}

Li, Xiaoqiang ^{[1
,2
]}

Li, Jide ^{[1
]}

Ding, Guangtai ^{[1
]}

Tong, Weiqin ^{[1
,2
]}

机构：

[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China

[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

Face Tracking; Self-supervised Learning; Knowledge Distillation; FACE ALIGNMENT;

D O I：

10.1145/3394171.3413993

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diversity of training data significantly affects tracking robustness of model under unconstrained environments. However, existing labeled datasets for facial landmark tracking tend to be large but not diverse, and manually annotating the massive clips of new diverse videos is extremely expensive. To address these problems, we propose a Spatial-Temporal Knowledge Integration (STKI) approach. Unlike most existing methods which rely heavily on labeled data, STKI exploits supervisions from unlabeled data. Specifically, STKI integrates spatial-temporal knowledge from massive unlabeled videos, which has several orders of magnitude more than existing labeled video data on the diversity, for robust tracking. Our framework includes a self-supervised tracker and an image-based detector for tracking initialization. To avoid the distortion of facial shape, the tracker leverages adversarial learning to introduce facial structure prior and temporal knowledge into cycle-consistency tracking. Meanwhile, we design a graph-based knowledge distillation method, which distills the knowledge from tracking and detection results, to improve the generalization of the detector. The fine-tuned detector can provide tracker on unconstrained videos with high-quality tracking initialization. Extensive experimental results show that the proposed method achieves state-of-the-art performance on comprehensive evaluation datasets.

引用

页码：4135 / 4143

页数：9

共 50 条

[1] Multi-Sourced Knowledge Integration for Robust Self-Supervised Facial Landmark Tracking
Zhu, Congcong
Li, Xiaoqiang
Li, Jide
Dai, Songmin
Tong, Weiqin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6616 - 6628
[2] Exploiting Self-Supervised and Semi-Supervised Learning for Facial Landmark Tracking with Unlabeled Data
Yin, Shi
Wang, Shangfei
Chen, Xiaoping
Chen, Enhong
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2991 - 2998
[3] Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction
Li, Zhonghang
Huang, Chao
Xia, Lianghao
Xu, Yong
Pei, Jian
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2984 - 2996
[4] Attentive spatial-temporal contrastive learning for self-supervised video representation
Yang, Xingming
Xiong, Sixuan
Wu, Kewei
Shan, Dongfeng
Xie, Zhao
IMAGE AND VISION COMPUTING, 2023, 137
[5] Self-supervised Pre -training for Robust and Generic Spatial -Temporal Representations
Hu, Mingzhi
Zhong, Zhuoyun
Zhang, Xin
Li, Yanhua
Xie, Yiqun
Jia, Xiaowei
Zhou, Xun
Luo, Jun
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 150 - 159
[6] Self-supervised Spatial-Temporal Normality Learning for Time Series Anomaly Detection
Chen, Yutong
Xu, Hongzuo
Pang, Guansong
Qiao, Hezhe
Zhou, Yuan
Shang, Mingsheng
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK, PT VI, ECML PKDD 2024, 2024, 14946 : 145 - 162
[7] Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition
Zhao, Weichao
Zhou, Wengang
Hu, Hezhen
Wang, Min
Li, Houqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4188 - 4201
[8] SSRL: Self-Supervised Spatial-Temporal Representation Learning for 3D Action Recognition
Jin, Zhihao
Wang, Yifan
Wang, Qicong
Shen, Yehu
Meng, Hongying
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 274 - 285
[9] Self-supervised spatial-temporal transformer fusion based federated framework for 4D cardiovascular
Mazher, Moona
Razzak, Imran
Qayyum, Abdul
Tanveer, M.
Beier, Susann
Khan, Tariq
Niederer, Steven A.
INFORMATION FUSION, 2024, 106
[10] Robust facial landmark tracking via cascade regression
Liu, Qingshan
Yang, Jing
Deng, Jiankang
Zhang, Kaihua
PATTERN RECOGNITION, 2017, 66 : 53 - 62

← 1 2 3 4 5 →