Spatial-Temporal Knowledge Integration: Robust Self-Supervised Facial Landmark Tracking

被引:5
作者
Zhu, Congcong [1 ]
Li, Xiaoqiang [1 ,2 ]
Li, Jide [1 ]
Ding, Guangtai [1 ]
Tong, Weiqin [1 ,2 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
关键词
Face Tracking; Self-supervised Learning; Knowledge Distillation; FACE ALIGNMENT;
D O I
10.1145/3394171.3413993
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diversity of training data significantly affects tracking robustness of model under unconstrained environments. However, existing labeled datasets for facial landmark tracking tend to be large but not diverse, and manually annotating the massive clips of new diverse videos is extremely expensive. To address these problems, we propose a Spatial-Temporal Knowledge Integration (STKI) approach. Unlike most existing methods which rely heavily on labeled data, STKI exploits supervisions from unlabeled data. Specifically, STKI integrates spatial-temporal knowledge from massive unlabeled videos, which has several orders of magnitude more than existing labeled video data on the diversity, for robust tracking. Our framework includes a self-supervised tracker and an image-based detector for tracking initialization. To avoid the distortion of facial shape, the tracker leverages adversarial learning to introduce facial structure prior and temporal knowledge into cycle-consistency tracking. Meanwhile, we design a graph-based knowledge distillation method, which distills the knowledge from tracking and detection results, to improve the generalization of the detector. The fine-tuned detector can provide tracker on unconstrained videos with high-quality tracking initialization. Extensive experimental results show that the proposed method achieves state-of-the-art performance on comprehensive evaluation datasets.
引用
收藏
页码:4135 / 4143
页数:9
相关论文
共 50 条
  • [1] Multi-Sourced Knowledge Integration for Robust Self-Supervised Facial Landmark Tracking
    Zhu, Congcong
    Li, Xiaoqiang
    Li, Jide
    Dai, Songmin
    Tong, Weiqin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6616 - 6628
  • [2] Exploiting Self-Supervised and Semi-Supervised Learning for Facial Landmark Tracking with Unlabeled Data
    Yin, Shi
    Wang, Shangfei
    Chen, Xiaoping
    Chen, Enhong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2991 - 2998
  • [3] Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction
    Li, Zhonghang
    Huang, Chao
    Xia, Lianghao
    Xu, Yong
    Pei, Jian
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2984 - 2996
  • [4] Attentive spatial-temporal contrastive learning for self-supervised video representation
    Yang, Xingming
    Xiong, Sixuan
    Wu, Kewei
    Shan, Dongfeng
    Xie, Zhao
    IMAGE AND VISION COMPUTING, 2023, 137
  • [5] Self-supervised Pre -training for Robust and Generic Spatial -Temporal Representations
    Hu, Mingzhi
    Zhong, Zhuoyun
    Zhang, Xin
    Li, Yanhua
    Xie, Yiqun
    Jia, Xiaowei
    Zhou, Xun
    Luo, Jun
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 150 - 159
  • [6] Self-supervised Spatial-Temporal Normality Learning for Time Series Anomaly Detection
    Chen, Yutong
    Xu, Hongzuo
    Pang, Guansong
    Qiao, Hezhe
    Zhou, Yuan
    Shang, Mingsheng
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK, PT VI, ECML PKDD 2024, 2024, 14946 : 145 - 162
  • [7] Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition
    Zhao, Weichao
    Zhou, Wengang
    Hu, Hezhen
    Wang, Min
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4188 - 4201
  • [8] SSRL: Self-Supervised Spatial-Temporal Representation Learning for 3D Action Recognition
    Jin, Zhihao
    Wang, Yifan
    Wang, Qicong
    Shen, Yehu
    Meng, Hongying
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 274 - 285
  • [9] Self-supervised spatial-temporal transformer fusion based federated framework for 4D cardiovascular
    Mazher, Moona
    Razzak, Imran
    Qayyum, Abdul
    Tanveer, M.
    Beier, Susann
    Khan, Tariq
    Niederer, Steven A.
    INFORMATION FUSION, 2024, 106
  • [10] Robust facial landmark tracking via cascade regression
    Liu, Qingshan
    Yang, Jing
    Deng, Jiankang
    Zhang, Kaihua
    PATTERN RECOGNITION, 2017, 66 : 53 - 62