A Quantum-Inspired Framework in Leader-Servant Mode for Large-Scale Multi-Modal Place Recognition

被引：0

作者：

Zhang, Ruonan ^{[1
]}

Li, Ge ^{[2
]}

Gao, Wei ^{[3
,4
]}

Liu, Shan ^{[5
]}

机构：

[1] Ningxia Univ, Sch Adv Interdisciplinary Studies, Zhongwei 755000, Peoples R China

[2] Peking Univ, Sch Elect & Comp Engn SECE, Shenzhen Grad Sch, Guangdong Prov Key Lab Ultra High Definit Immers M, Shenzhen 518055, Peoples R China

[3] Peking Univ, Sch Elect & Comp Engn SECE, Shenzhen Grad Sch, Guangdong Prov Key Lab Ultra High Definit Immers M, Shenzhen 518055, Peoples R China

[4] Peng Cheng Natl Lab, Shenzhen 518066, Peoples R China

[5] Tencent, Media Lab, Palo Alto, CA 94301 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2025年 / 26卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Training; Point cloud compression; Feature extraction; Interference; Wave functions; Quantum mechanics; Image recognition; Fuses; Convolution; Three-dimensional displays; Multi-modal; place recognition; 3D point cloud; image; feature fusion;

D O I：

10.1109/TITS.2024.3497574

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Multi-modal place recognition aims to grasp diversified information implied in different modalities to bring vitality to place recognition tasks. The key challenge is rooted in the representation gap in modalities, the feature fusion method, and their relationships. The majority of existing methods are based on uni-modal, leaving these challenges unsolved effectively. To address the problems, encouraged by double-split experiments in physics and cooperation modes, in this paper, we introduce a leader-servant multi-modal framework inspired by quantum theory for large-scale place recognition. Two key modules are designed, a quantum representation module and an interference-aware fusion module. The former is designed for multi-modal data to capture their diversity and bridge the gap, while the latter is proposed to effectively fuse the multi-modal feature with the guidance of the quantum theory. Besides, we propose a leader-servant training strategy for stable training, where three cases are considered with the multi-modal loss as the leader to preserve overall characteristics and other uni-modal losses as the servants to lighten the modality influence of the leader. Furthermore, The framework is compatible with uni-modal place recognition. At last, The experiments on three datasets witness the efficiency, generalization, and robustness of the proposed method in contrast to the other existing methods.

引用

页码：2027 / 2039

页数：13

共 15 条

[1] Large-scale Multi-modal Search and QA at Alibaba
Jin, Rong
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 8 - 8
[2] MMpedia: A Large-Scale Multi-modal Knowledge Graph
Wu, Yinan
Wu, Xiaowei
Li, Junwen
Zhang, Yue
Wang, Haofen
Du, Wen
He, Zhidong
Liu, Jingping
Ruan, Tong
SEMANTIC WEB, ISWC 2023, PT II, 2023, 14266 : 18 - 37
[3] Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
Wang, Meng
Wang, Haofen
Qi, Guilin
Zheng, Qiushuo
BIG DATA RESEARCH, 2020, 22 (22)
[4] MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild
Liu, Yuanyuan
Dai, Wei
Feng, Chuanxu
Wang, Wenbin
Yin, Guanghao
Zeng, Jiabei
Shan, Shiguang
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[5] Semantic-Driven Interpretable Deep Multi-Modal Hashing for Large-Scale Multimedia Retrieval
Lu, Xu
Liu, Li
Nie, Liqiang
Chang, Xiaojun
Zhang, Huaxiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4541 - 4554
[6] A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification
Zhang, Shichuan
Tang, Zengming
Pan, Hao
Wei, Xinyu
Huang, Jun
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2539 - 2542
[7] Fast Discrete Collaborative Multi-Modal Hashing for Large-Scale Multimedia Retrieval
Zheng, Chaoqun
Zhu, Lei
Lu, Xu
Li, Jingjing
Cheng, Zhiyong
Zhang, Hanwang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (11) : 2171 - 2184
[8] Large-Scale Bandwidth and Power Optimization for Multi-Modal Edge Intelligence Autonomous Driving
Li, Xinrao
Zhang, Tong
Wang, Shuai
Zhu, Guangxu
Wang, Rui
Chang, Tsung-Hui
IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (06) : 1096 - 1100
[9] IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments
Soliman, Abanob
Bonardi, Fabien
Sidibe, Desire
Bouchafa, Samia
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2022, 106 (03)
[10] IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments
Abanob Soliman
Fabien Bonardi
Désiré Sidibé
Samia Bouchafa
Journal of Intelligent & Robotic Systems, 2022, 106

← 1 2 →