On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection

被引：0

作者：

Nadeem, Maryam ^{[1
]}

Nandakumar, Karthik ^{[1
]}

机构：

[1] MBZUAI, Abu Dhabi, U Arab Emirates

来源：

2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB | 2023年

关键词：

D O I：

10.1109/IJCB57857.2023.10448619

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Presentation attacks pose a serious threat to the integrity of fingerprint-based biometric systems. Existing methods for fingerprint presentation attack detection (FpPAD) suffer from a lack of generalizability across different sensors and attack instruments, especially those that are not encountered during training. Recently, deep neural networks based on the Vision Transformer (ViT) architecture have demonstrated impressive generalization performance across many image recognition tasks due to their ability to effectively model long-range dependencies between image patches through the self-attention mechanism. While ViT models have been considered for FpPAD, many practical intricacies involved in learning generalizable ViTs for the FpPAD task have not been explored in depth. These include: (i) what is the best way to pre-process a fingerprint image to generate patches required by a ViT?, (ii) how to pre-train the ViT backbone to be used in FpPAD?, (iii) how to finetune the pre-trained ViT backbone for the FpPAD task?, and (iv) what is the most effective classifier design for a ViT-based FpPAD system? In this study, we undertake a thorough empirical study based on two public-domain datasets (LivDet 2015 and MSU-FPAD) in search of answers to the above questions. The key findings of this study are as follows: (i) Using minutia-aligned local patches provides the best PAD performance compared to partitioning the image into fixed number of non-overlapping patches. (ii) Self-supervised pre-training based on Masked Image Modeling (MIM) leads to better generalization performance than multimodal approaches such as image-text alignment. (iii) Visual prompt learning is a more effective way to adapt a pre-trained ViT model for FpPAD compared to full-tuning. (iv) Learning a linear classification head together with the visual prompts provides superior performance compared to linear probing and alignment with fixed text prompts. We hope that the above findings will be useful to the biometrics community and accelerate the deployment of ViT models for practical FpPAD systems.

引用

页数：10

共 35 条

[21] Self-supervised 2D face presentation attack detection via temporal sequence sampling
Muhammad, Usman
Yu, Zitong
Komulainen, Jukka
[J]. PATTERN RECOGNITION LETTERS, 2022, 156 : 15 - 22
[22] Self-supervised Cross-stage Regional Contrastive Learning for Object Detection
Yan, Junkai
Yang, Lingxiao
Gao, Yipeng
Zheng, Wei-Shi
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1044 - 1049
[23] Applying masked autoencoder-based self-supervised learning for high-capability vision transformers of electrocardiographies
Sawano, Shinnosuke
Kodera, Satoshi
Setoguchi, Naoto
Tanabe, Kengo
Kushida, Shunichi
Kanda, Junji
Saji, Mike
Nanasato, Mamoru
Maki, Hisataka
Fujita, Hideo
Kato, Nahoko
Watanabe, Hiroyuki
Suzuki, Minami
Takahashi, Masao
Sawada, Naoko
Yamasaki, Masao
Sato, Masataka
Katsushika, Susumu
Shinohara, Hiroki
Takeda, Norifumi
Fujiu, Katsuhito
Daimon, Masao
Akazawa, Hiroshi
Morita, Hiroyuki
Komuro, Issei
[J]. PLOS ONE, 2024, 19 (08):
[24] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
Wu, Jiajie
Xu, Xianghua
[J]. Engineering Applications of Artificial Intelligence, 2024, 127
[25] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
Wu, Jiajie
Xu, Xianghua
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
[26] Presentation attack detection based on two-stream vision transformers with self-attention fusion
Peng, Fei
Meng, Shao-hua
Long, Min
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
[27] Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Wang, Xin
Huang, Qiuyuan
Celikyilmaz, Asli
Gao, Jianfeng
Shen, Dinghan
Wang, Yuan-Fang
Wang, William Yang
Zhang, Lei
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3622 - 6631
[28] Self-Supervised Learning-Based General Fine-tuning Framework For Audio Classification and Event Detection
Sun, Yanjie
Xu, Kele
Dou, Yong
Gao, Tian
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
[29] Liveness Detection in Computer Vision: Transformer-Based Self-Supervised Learning for Face Anti-Spoofing
Keresh, Arman
Shamoi, Pakizar
[J]. IEEE ACCESS, 2024, 12 : 185673 - 185685
[30] Cross-domain self-supervised learning for local feature point detection and description of underwater images
Cheng, Qin
Wang, Zhuo
Qin, Hongde
Mu, Xiaokai
[J]. DIGITAL SIGNAL PROCESSING, 2025, 159

← 1 2 3 4 →