On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection

被引:0
作者
Nadeem, Maryam [1 ]
Nandakumar, Karthik [1 ]
机构
[1] MBZUAI, Abu Dhabi, U Arab Emirates
来源
2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB | 2023年
关键词
D O I
10.1109/IJCB57857.2023.10448619
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Presentation attacks pose a serious threat to the integrity of fingerprint-based biometric systems. Existing methods for fingerprint presentation attack detection (FpPAD) suffer from a lack of generalizability across different sensors and attack instruments, especially those that are not encountered during training. Recently, deep neural networks based on the Vision Transformer (ViT) architecture have demonstrated impressive generalization performance across many image recognition tasks due to their ability to effectively model long-range dependencies between image patches through the self-attention mechanism. While ViT models have been considered for FpPAD, many practical intricacies involved in learning generalizable ViTs for the FpPAD task have not been explored in depth. These include: (i) what is the best way to pre-process a fingerprint image to generate patches required by a ViT?, (ii) how to pre-train the ViT backbone to be used in FpPAD?, (iii) how to finetune the pre-trained ViT backbone for the FpPAD task?, and (iv) what is the most effective classifier design for a ViT-based FpPAD system? In this study, we undertake a thorough empirical study based on two public-domain datasets (LivDet 2015 and MSU-FPAD) in search of answers to the above questions. The key findings of this study are as follows: (i) Using minutia-aligned local patches provides the best PAD performance compared to partitioning the image into fixed number of non-overlapping patches. (ii) Self-supervised pre-training based on Masked Image Modeling (MIM) leads to better generalization performance than multimodal approaches such as image-text alignment. (iii) Visual prompt learning is a more effective way to adapt a pre-trained ViT model for FpPAD compared to full-tuning. (iv) Learning a linear classification head together with the visual prompts provides superior performance compared to linear probing and alignment with fixed text prompts. We hope that the above findings will be useful to the biometrics community and accelerate the deployment of ViT models for practical FpPAD systems.
引用
收藏
页数:10
相关论文
共 35 条
  • [21] Self-supervised 2D face presentation attack detection via temporal sequence sampling
    Muhammad, Usman
    Yu, Zitong
    Komulainen, Jukka
    [J]. PATTERN RECOGNITION LETTERS, 2022, 156 : 15 - 22
  • [22] Self-supervised Cross-stage Regional Contrastive Learning for Object Detection
    Yan, Junkai
    Yang, Lingxiao
    Gao, Yipeng
    Zheng, Wei-Shi
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1044 - 1049
  • [23] Applying masked autoencoder-based self-supervised learning for high-capability vision transformers of electrocardiographies
    Sawano, Shinnosuke
    Kodera, Satoshi
    Setoguchi, Naoto
    Tanabe, Kengo
    Kushida, Shunichi
    Kanda, Junji
    Saji, Mike
    Nanasato, Mamoru
    Maki, Hisataka
    Fujita, Hideo
    Kato, Nahoko
    Watanabe, Hiroyuki
    Suzuki, Minami
    Takahashi, Masao
    Sawada, Naoko
    Yamasaki, Masao
    Sato, Masataka
    Katsushika, Susumu
    Shinohara, Hiroki
    Takeda, Norifumi
    Fujiu, Katsuhito
    Daimon, Masao
    Akazawa, Hiroshi
    Morita, Hiroyuki
    Komuro, Issei
    [J]. PLOS ONE, 2024, 19 (08):
  • [24] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
    Wu, Jiajie
    Xu, Xianghua
    [J]. Engineering Applications of Artificial Intelligence, 2024, 127
  • [25] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
    Wu, Jiajie
    Xu, Xianghua
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [26] Presentation attack detection based on two-stream vision transformers with self-attention fusion
    Peng, Fei
    Meng, Shao-hua
    Long, Min
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
  • [27] Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3622 - 6631
  • [28] Self-Supervised Learning-Based General Fine-tuning Framework For Audio Classification and Event Detection
    Sun, Yanjie
    Xu, Kele
    Dou, Yong
    Gao, Tian
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [29] Liveness Detection in Computer Vision: Transformer-Based Self-Supervised Learning for Face Anti-Spoofing
    Keresh, Arman
    Shamoi, Pakizar
    [J]. IEEE ACCESS, 2024, 12 : 185673 - 185685
  • [30] Cross-domain self-supervised learning for local feature point detection and description of underwater images
    Cheng, Qin
    Wang, Zhuo
    Qin, Hongde
    Mu, Xiaokai
    [J]. DIGITAL SIGNAL PROCESSING, 2025, 159