On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection

被引：0

作者：

Nadeem, Maryam ^{[1
]}

Nandakumar, Karthik ^{[1
]}

机构：

[1] MBZUAI, Abu Dhabi, U Arab Emirates

来源：

2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB | 2023年

关键词：

D O I：

10.1109/IJCB57857.2023.10448619

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Presentation attacks pose a serious threat to the integrity of fingerprint-based biometric systems. Existing methods for fingerprint presentation attack detection (FpPAD) suffer from a lack of generalizability across different sensors and attack instruments, especially those that are not encountered during training. Recently, deep neural networks based on the Vision Transformer (ViT) architecture have demonstrated impressive generalization performance across many image recognition tasks due to their ability to effectively model long-range dependencies between image patches through the self-attention mechanism. While ViT models have been considered for FpPAD, many practical intricacies involved in learning generalizable ViTs for the FpPAD task have not been explored in depth. These include: (i) what is the best way to pre-process a fingerprint image to generate patches required by a ViT?, (ii) how to pre-train the ViT backbone to be used in FpPAD?, (iii) how to finetune the pre-trained ViT backbone for the FpPAD task?, and (iv) what is the most effective classifier design for a ViT-based FpPAD system? In this study, we undertake a thorough empirical study based on two public-domain datasets (LivDet 2015 and MSU-FPAD) in search of answers to the above questions. The key findings of this study are as follows: (i) Using minutia-aligned local patches provides the best PAD performance compared to partitioning the image into fixed number of non-overlapping patches. (ii) Self-supervised pre-training based on Masked Image Modeling (MIM) leads to better generalization performance than multimodal approaches such as image-text alignment. (iii) Visual prompt learning is a more effective way to adapt a pre-trained ViT model for FpPAD compared to full-tuning. (iv) Learning a linear classification head together with the visual prompts provides superior performance compared to linear probing and alignment with fixed text prompts. We hope that the above findings will be useful to the biometrics community and accelerate the deployment of ViT models for practical FpPAD systems.

引用

页数：10

共 32 条

[1] Self-Supervised Vision Transformers for Malware Detection
Seneviratne, Sachith
Shariffdeen, Ridwan
Rasnayaka, Sanka
Kasthuriarachchi, Nuran
IEEE ACCESS, 2022, 10 : 103121 - 103135
[2] Self-supervised learning with randomized cross-sensor masked reconstruction for human activity recognition
Logacjov, Aleksej
Bach, Kerstin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128
[3] Fingerprint Presentation Attack Detection with Supervised Contrastive Learning
Huang, Chuanwei
Fei, Hongyan
Wu, Song
Wang, Zheng
Jia, Zexi
Feng, Jufu
2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
[4] Jointly Optimal Incremental Learning with Self-Supervised Vision Transformers
Witzgall, Hanna
2024 IEEE AEROSPACE CONFERENCE, 2024,
[5] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
Mo, Shentong
Sun, Zhun
Li, Chao
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
[6] Patch-level Representation Learning for Self-supervised Vision Transformers
Yun, Sukmin
Lee, Hankook
Kim, Jaehyung
Shin, Jinwoo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
[7] Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images
Samele, Stefano
Matteucci, Matteo
2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
[8] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
Nasim, Ammara
Akram, Muhammad Usman
Khan, Asad Mansoor
Khan, Muhammad Belal Afsar
Hassan, Taimur
2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
[9] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
[10] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
Pincic, Domagoj
Susanj, Diego
Lenac, Kristijan
SENSORS, 2022, 22 (19)

← 1 2 3 4 →