Logit prototype learning with active multimodal representation for robust open-set recognition

被引：1

作者：

Fu, Yimin ^{[1
]}

Liu, Zhunga ^{[1
]}

Wang, Zicheng ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2024年 / 67卷 / 06期

基金：

中国国家自然科学基金;

关键词：

logit prototype learning; multimodal perception; open-set recognition; uncertainty estimation; ALGORITHMS;

D O I：

10.1007/s11432-023-3924-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

引用

页数：16

共 47 条

[1] Multimodal Machine Learning: A Survey and Taxonomy [J].

Baltrusaitis, Tadas ;

Ahuja, Chaitanya ;

Morency, Louis-Philippe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443

[2] Towards Open Set Deep Networks [J].

Bendale, Abhijit ;

Boult, Terrance E. .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1563-1572

[3]

Bendale A, 2015, PROC CVPR IEEE, P1893, DOI 10.1109/CVPR.2015.7298799

[4] Adversarial Reciprocal Points Learning for Open Set Recognition [J].

Chen, Guangyao ;

Peng, Peixi ;

Wang, Xiangqian ;

Tian, Yonghong .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) :8065-8081

[5] Learning Open Set Network with Discriminative Reciprocal Points [J].

Chen, Guangyao ;

Qiao, Limeng ;

Shi, Yemin ;

Peng, Peixi ;

Li, Jia ;

Huang, Tiejun ;

Pu, Shiliang ;

Tian, Yonghong .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :507-522

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7] Robust Face Recognition via Multimodal Deep Face Representation [J].

Ding, Changxing ;

Tao, Dacheng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :2049-2058

[8] Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [J].

Feng, Di ;

Haase-Schutz, Christian ;

Rosenbaum, Lars ;

Hertlein, Heinz ;

Glaser, Claudius ;

Timm, Fabian ;

Wiesbeck, Werner ;

Dietmayer, Klaus .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) :1341-1360

[9] Adaptive Open Set Recognition with Multi-modal Joint Metric Learning [J].

Fu, Yimin ;

Liu, Zhunga ;

Yang, Yanbo ;

Xu, Linfeng ;

Lan, Hua .

PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 :631-644

[10]

Ge Z., 2017, BMVC

← 1 2 3 4 5 →