Precise facial landmark detection by Dynamic Semantic Aggregation Transformer

被引:1
|
作者
Wan, Jun [1 ,2 ]
Liu, He [1 ]
Wu, Yujia [3 ]
Lai, Zhihui [2 ]
Min, Wenwen [4 ]
Liu, Jun [5 ]
机构
[1] Zhongnan Univ Econ & Law, Sch Informat Engn, Wuhan 430073, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[3] Sanda Univ, Sch Informat Sci & Technol, Shanghai 201209, Peoples R China
[4] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650091, Yunnan, Peoples R China
[5] Singapore Univ Technol & Design, Informat Syst Technol & Design Pillar, Singapore 487372, Singapore
关键词
Facial landmark detection; Dynamic network; Multi-scale feature; Heavy occlusions; Heatmap regression; FACE RECOGNITION;
D O I
10.1016/j.patcog.2024.110827
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature. Our code is available at https://github.com/GERMINO-LiuHe/DSAT.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Precise Facial Landmark Detection by Reference Heatmap Transformer
    Wan, Jun
    Liu, Jun
    Zhou, Jie
    Lai, Zhihui
    Shen, Linlin
    Sun, Hang
    Xiong, Ping
    Min, Wenwen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1966 - 1977
  • [2] Robust and Precise Facial Landmark Detection by Self-Calibrated Pose Attention Network
    Wan, Jun
    Xi, Hui
    Zhou, Jie
    Lai, Zhihui
    Pedrycz, Witold
    Wang, Xu
    Sun, Hang
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (06) : 3546 - 3560
  • [3] Design of a Facial Landmark Detection System Using a Dynamic Optical Flow Approach
    Wu, Bing-Fei
    Chen, Bo-Rui
    Hsu, Chun-Fei
    IEEE ACCESS, 2021, 9 : 68737 - 68745
  • [4] Robust Facial Landmark Detection by Multiorder Multiconstraint Deep Networks
    Wan, Jun
    Lai, Zhihui
    Li, Jing
    Zhou, Jie
    Gao, Can
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2181 - 2194
  • [5] Accurate Facial Landmark Detector via Multi-scale Transformer
    Sha, Yuyang
    Meng, Weiyu
    Zhai, Xiaobing
    Xie, Can
    Li, Kefeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 278 - 290
  • [6] Summary on Facial Landmark Detection
    Wen, Jinghao
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MACHINERY, ELECTRONICS AND CONTROL SIMULATION (MECS 2017), 2017, 138 : 253 - 259
  • [7] A dynamic and multiresolution model of visual attention and its application to facial landmark detection
    Takacs, B
    Wechsler, H
    COMPUTER VISION AND IMAGE UNDERSTANDING, 1998, 70 (01) : 63 - 73
  • [8] Robust facial landmark detection by cross-order cross-semantic deep network
    Wan, Jun
    Lai, Zhihui
    Shen, Linlin
    Zhou, Jie
    Gao, Can
    Xiao, Gang
    Hou, Xianxu
    NEURAL NETWORKS, 2021, 136 : 233 - 243
  • [9] Facial Landmark Detection: A Literature Survey
    Wu, Yue
    Ji, Qiang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (02) : 115 - 142
  • [10] Facial Landmark Detection: A Literature Survey
    Yue Wu
    Qiang Ji
    International Journal of Computer Vision, 2019, 127 : 115 - 142