CDAF3D: Cross-Dimensional Attention Fusion for Indoor 3D Object Detection

被引：0

作者：

Wang, Shilin ^{[1
]}

Huang, Hai ^{[1
]}

Zhu, Yueyan ^{[1
]}

Tang, Zhenqi ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PT XIII, PRCV 2024 | 2025年 / 15043卷

基金：

国家重点研发计划;

关键词：

Indoor 3D Object Detection; Fusion Features; Point Cloud;

D O I：

10.1007/978-981-97-8493-6_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D object detection is a crucial task in computer vision and autonomous systems, which is widely utilized in robotics, autonomous driving, and augmented reality. With the advancement of input devices, researchers propose to use multimodal information to improve the detection accuracy. However, integrating 2D and 3D features effectively to harness their complementary nature for detection tasks is still a challenge. In this paper, we note that the complementary nature of geometric and visual texture information can effectively strengthen feature fusion, which plays a key role in detection. To this end, we propose the Cross-Dimensional Attention Fusion-based indoor 3D object detection method (CDAF3D). This method dynamically learns geometric information with corresponding 2D image texture details through a cross-dimensional attention mechanism, enabling the model to capture and integrate spatial and textural information effectively. Additionally, due to the nature of 3D object detection, where intersecting entities with different specific labels are unrealistic, we further propose Preventive 3D Intersect Loss (P3DIL). This loss enhances detection accuracy by addressing intersections between objects of different labels. We evaluate the proposed CDAF3D on the SUN RGB-D and Scannet v2 datasets. Our results achieve 78.2 mAP@0.25 and 66.5 mAP@0.50 on ScanNetV2 and 70.3 mAP@0.25 and 54.1 mAP@0.50 on SUN RGB-D. The proposed CDAF3D outperforms all the multi-sensor-based methods with 3D IoU thresholds of 0.25 and 0.5.

引用

页码：165 / 177

页数：13

共 50 条

[41] Characteristics of Earthquake Cycles: A Cross-Dimensional Comparison of 0D to 3D Numerical Models
Li, Meng
Pranger, Casper
van Dinther, Ylona
JOURNAL OF GEOPHYSICAL RESEARCH-SOLID EARTH, 2022, 127 (08)
[42] Towards Raw Sensor Fusion in 3D Object Detection
Rovid, Andras
Remeli, Viktor
2019 IEEE 17TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2019), 2019, : 293 - 298
[43] TR3D: TOWARDS REAL-TIME INDOOR 3D OBJECT DETECTION
Rukhovich, Danila
Vorontsova, Anna
Konushin, Anton
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 281 - 285
[44] Cross-dimensional adaptivity research on a 3D earth observation data cube model
Yu, Jinsongdi
Cui, Zhanying
Baumann, Peter
Tong, Ruiju
Wei, Dandan
Luo, Yuan
OPEN GEOSCIENCES, 2025, 17 (01):
[45] SOFW: A Synergistic Optimization Framework for Indoor 3D Object Detection
Dai, Kun
Jiang, Zhiqiang
Xie, Tao
Wang, Ke
Liu, Dedong
Fan, Zhendong
Li, Ruifeng
Zhao, Lijun
Omar, Mohamed
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 637 - 651
[46] Spatial and Semantic Information Enhancement for Indoor 3D Object Detection
Chen, Chunmei
Liang, Zhiqiang
Liu, Haitao
Liu, Xin
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (05) : 831 - 839
[47] Real-Time 3D Visual Perception by Cross-Dimensional Refined Learning
Hong, Ziyang
Yue, C. Patrick
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 10326 - 10338
[48] 2D3D-DescNet: Jointly Learning 2D and 3D Local Feature Descriptors for Cross-Dimensional Matching
Chen, Shuting
Su, Yanfei
Lai, Baiqi
Cai, Luwei
Hong, Chengxi
Li, Li
Qiu, Xiuliang
Jia, Hong
Liu, Weiquan
REMOTE SENSING, 2024, 16 (13)
[49] 3D-MAN: 3D Multi-frame Attention Network for Object Detection
Yang, Zetong
Zhou, Yin
Chen, Zhifeng
Ngiam, Jiquan
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1863 - 1872
[50] CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection
Hwang, Jyh-Jing
Kretzschmar, Henrik
Manela, Joshua
Rafferty, Sean
Armstrong-Crews, Nicholas
Chen, Tiffany
Anguelov, Dragomir
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 388 - 405

← 1 2 3 4 5 →