Multimodal Object Detection via Probabilistic Ensembling

被引:71
作者
Chen, Yi-Ting [1 ,2 ]
Shi, Jinghao [2 ]
Ye, Zelin [2 ]
Mertz, Christoph [2 ]
Ramanan, Deva [2 ,3 ]
Kong, Shu [2 ,4 ]
机构
[1] Univ Maryland, College Pk, MD USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Argo AI, Pittsburgh, PA USA
[4] Texas A&M Univ, College Stn, TX 77843 USA
来源
COMPUTER VISION, ECCV 2022, PT IX | 2022年 / 13669卷
关键词
Object detection; Multimodal detection; Infrared; hermal; Probabilistic model; Ensembling; Multimodal fusion; Uncertainity; NETWORK;
D O I
10.1007/978-3-031-20077-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study multimodal object detection with RGB and thermal cameras, since the latter provides much stronger object signatures under poor illumination. We explore strategies for fusing information from different modalities. Our key contribution is a probabilistic ensembling technique, ProbEn, a simple nonlearned method that fuses together detections from multi-modalities. We derive ProbEn from Bayes' rule and first principles that assume conditional independence across modalities. Through probabilistic marginalization, ProbEn elegantly handles missing modalities when detectors do not fire on the same object. Importantly, ProbEn also notably improves multimodal detection even when the conditional independence assumption does not hold, e.g., fusing outputs from other fusion methods (both off-the-shelf and trained in-house). We validate ProbEn on two benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal images, showing that ProbEn outperforms prior work by more than 13% in relative performance!
引用
收藏
页码:139 / 158
页数:20
相关论文
共 58 条
[1]  
Akiba T, 2018, Arxiv, DOI arXiv:1809.00778
[2]   SyNet: An Ensemble Network for Object Detection in UAV Images [J].
Albaba, Berat Mert ;
Ozer, Sedat .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :10227-10234
[3]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[4]   Soft-NMS - Improving Object Detection With One Line of Code [J].
Bodla, Navaneeth ;
Singh, Bharat ;
Chellappa, Rama ;
Davis, Larry S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570
[5]  
Bolya Daniel, 2019, ICCV
[6]  
Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[7]  
Cao Y., 2019, IEEE INT C COMPUTER
[8]  
Choi H, 2016, INT C PATT RECOG, P621, DOI 10.1109/ICPR.2016.7899703
[9]  
Datal N., 2005, 2005 IEEE COMP VIS P, P886, DOI 10.1109/CVPR.2005.177
[10]  
DAWID AP, 1979, J ROY STAT SOC B MET, V41, P1