On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection

被引:0
|
作者
Nadeem, Maryam [1 ]
Nandakumar, Karthik [1 ]
机构
[1] MBZUAI, Abu Dhabi, U Arab Emirates
来源
2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB | 2023年
关键词
D O I
10.1109/IJCB57857.2023.10448619
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Presentation attacks pose a serious threat to the integrity of fingerprint-based biometric systems. Existing methods for fingerprint presentation attack detection (FpPAD) suffer from a lack of generalizability across different sensors and attack instruments, especially those that are not encountered during training. Recently, deep neural networks based on the Vision Transformer (ViT) architecture have demonstrated impressive generalization performance across many image recognition tasks due to their ability to effectively model long-range dependencies between image patches through the self-attention mechanism. While ViT models have been considered for FpPAD, many practical intricacies involved in learning generalizable ViTs for the FpPAD task have not been explored in depth. These include: (i) what is the best way to pre-process a fingerprint image to generate patches required by a ViT?, (ii) how to pre-train the ViT backbone to be used in FpPAD?, (iii) how to finetune the pre-trained ViT backbone for the FpPAD task?, and (iv) what is the most effective classifier design for a ViT-based FpPAD system? In this study, we undertake a thorough empirical study based on two public-domain datasets (LivDet 2015 and MSU-FPAD) in search of answers to the above questions. The key findings of this study are as follows: (i) Using minutia-aligned local patches provides the best PAD performance compared to partitioning the image into fixed number of non-overlapping patches. (ii) Self-supervised pre-training based on Masked Image Modeling (MIM) leads to better generalization performance than multimodal approaches such as image-text alignment. (iii) Visual prompt learning is a more effective way to adapt a pre-trained ViT model for FpPAD compared to full-tuning. (iv) Learning a linear classification head together with the visual prompts provides superior performance compared to linear probing and alignment with fixed text prompts. We hope that the above findings will be useful to the biometrics community and accelerate the deployment of ViT models for practical FpPAD systems.
引用
收藏
页数:10
相关论文
共 32 条
  • [1] Self-Supervised Vision Transformers for Malware Detection
    Seneviratne, Sachith
    Shariffdeen, Ridwan
    Rasnayaka, Sanka
    Kasthuriarachchi, Nuran
    IEEE ACCESS, 2022, 10 : 103121 - 103135
  • [2] Self-supervised learning with randomized cross-sensor masked reconstruction for human activity recognition
    Logacjov, Aleksej
    Bach, Kerstin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128
  • [3] Fingerprint Presentation Attack Detection with Supervised Contrastive Learning
    Huang, Chuanwei
    Fei, Hongyan
    Wu, Song
    Wang, Zheng
    Jia, Zexi
    Feng, Jufu
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [4] Jointly Optimal Incremental Learning with Self-Supervised Vision Transformers
    Witzgall, Hanna
    2024 IEEE AEROSPACE CONFERENCE, 2024,
  • [5] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
    Mo, Shentong
    Sun, Zhun
    Li, Chao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
  • [6] Patch-level Representation Learning for Self-supervised Vision Transformers
    Yun, Sukmin
    Lee, Hankook
    Kim, Jaehyung
    Shin, Jinwoo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
  • [7] Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images
    Samele, Stefano
    Matteucci, Matteo
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [8] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
    Nasim, Ammara
    Akram, Muhammad Usman
    Khan, Asad Mansoor
    Khan, Muhammad Belal Afsar
    Hassan, Taimur
    2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
  • [9] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
  • [10] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
    Pincic, Domagoj
    Susanj, Diego
    Lenac, Kristijan
    SENSORS, 2022, 22 (19)