Multi-Sourced Knowledge Integration for Robust Self-Supervised Facial Landmark Tracking

被引：2

作者：

Zhu, Congcong ^{[1
,2
]}

Li, Xiaoqiang ^{[2
]}

Li, Jide ^{[2
]}

Dai, Songmin ^{[2
]}

Tong, Weiqin ^{[2
]}

机构：

[1] Yangzhou Univ, Coll Informat Engn, Yangzhou 225009, Jiangsu, Peoples R China

[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Videos; Shape; Detectors; Faces; Robustness; Annotations; Semantics; Biostatistics; facial landmark tracking; knowledge integration; self-supervised learning; FACE ALIGNMENT; NETWORK;

D O I：

10.1109/TMM.2022.3212265

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Expensive annotation costs significantly hinder the development of facial landmark tracking owing to the frame-by-frame labeling of dense landmarks. The most promising approach to address this problem is to develop a self-supervised tracker for large-scale unlabeled videos. However, existing self-supervised trackers trained using single-sourced knowledge are unstable under unconstrained environments. Herein, we propose multi-sourced knowledge integration (MSKI), a robust self-supervised tracking method. It integrates knowledge from multiple sources to provide supervisory signals, thereby improving the stability of the self-supervised tracker. Specifically, the proposed MSKI comprises two complementary modules: a temporal knowledge reasoning (TempRes) module and an interactive knowledge distillation (KnowDist) module. The TempRes module enforces the tracker to achieve cycle-consistent tracking, allowing the tracker to learn temporal correspondence based on the cycle-consistency of time. To exploit facial geometry knowledge against various occlusions, our tracker imposes a multi-level shape constraint over the structure of facial landmarks by leveraging adversarial shape learning, thereby enabling the tracking of occluded faces. Moreover, the tracker interacts with an initialization detector to further develop complementary knowledge via KnowDist. The KnowDist module distills the spatial and temporal knowledge provided by the detector and tracker to generate plausible labels automatically. Finally, these generated labels are utilized to fine-tune the detector, such that it provides high-quality initial landmarks for the cycle-consistent tracking of the tracker on unlabeled videos. The experimental results show that the proposed MSKI can stabilize the tracking trajectory and improve the robustness against various occlusions.

引用

页码：6616 / 6628

页数：13

共 54 条

[1] Incremental Face Alignment in the Wild [J].

Asthana, Akshay ;

Zafeiriou, Stefanos ;

Cheng, Shiyang ;

Pantic, Maja .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1859-1866

[2] Face Alignment by Explicit Shape Regression [J].

Cao, Xudong ;

Wei, Yichen ;

Wen, Fang ;

Sun, Jian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 107 (02) :177-190

[3] Attention-Driven Cropping for Very High Resolution Facial Landmark Detection [J].

Chandran, Prashanth ;

Bradley, Derek ;

Gross, Markus ;

Beeler, Thabo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5860-5869

[4] CLKN: Cascaded Lucas-Kanade Networks for Image Alignment [J].

Chang, Che-Han ;

Chou, Chun-Nan ;

Chang, Edward Y. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3777-3785

[5] Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors [J].

Dong, Xuanyi ;

Yu, Shoou-, I ;

Weng, Xinshuo ;

Wei, Shih-En ;

Yang, Yi ;

Sheikh, Yaser .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :360-368

[6]

Durugkar Ishan., 2017, P INT C LEARN REPR

[7] Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks [J].

Feng, Zhen-Hua ;

Kittler, Josef ;

Awais, Muhammad ;

Huber, Patrik ;

Wu, Xiao-Jun .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2235-2245

[8]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[9] Multi-PIE [J].

Gross, Ralph ;

Matthews, Iain ;

Cohn, Jeffrey ;

Kanade, Takeo ;

Baker, Simon .

IMAGE AND VISION COMPUTING, 2010, 28 (05) :807-813

[10] Dual-Agent Deep Reinforcement Learning for Deformable Face Tracking [J].

Guo, Minghao ;

Lu, Jiwen ;

Zhou, Jie .

COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :783-799

← 1 2 3 4 5 6 →