Lightweight and Effective Convolutional Neural Networks for Vehicle Viewpoint Estimation From Monocular Images

被引：1

作者：

Magistri, Simone ^{[1
]}

Boschi, Marco ^{[2
]}

Sambo, Francesco ^{[3
]}

de Andrade, Douglas Coimbra ^{[3
]}

Simoncini, Matteo ^{[1
,3
]}

Kubin, Luca ^{[3
]}

Taccari, Leonardo ^{[3
]}

De Luigi, Luca ^{[2
]}

Salti, Samuele ^{[2
]}

机构：

[1] Univ Florence, Dept Informat Engn DINFO, I-50139 Florence, Italy

[2] Univ Bologna, Dept Comp Sci & Engn DISI, I-40136 Bologna, Italy

[3] Verizon Connect Res, I-50144 Florence, Italy

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2023年 / 24卷 / 01期

关键词：

Azimuth; convolutional neural networks; machine learning; monocular images; vehicles; yaw;

D O I：

10.1109/TITS.2022.3216359

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Vehicle viewpoint estimation from monocular images is a crucial component for autonomous driving vehicles and for fleet management applications. In this paper, we make several contributions to advance the state-of-the-art on this problem. We show the effectiveness of applying a smoothing filter to the output neurons of a Convolutional Neural Network (CNN) when estimating vehicle viewpoint. We point out the overlooked fact that, under the same viewpoint, the appearance of a vehicle is strongly influenced by its position in the image plane, which renders viewpoint estimation from appearance an ill-posed problem. We show how, by inserting in the model a CoordConv layer to provide the coordinates of the vehicle, we are able to solve such ambiguity and greatly increase performance. Finally, we introduce a new data augmentation technique that improves viewpoint estimation on vehicles that are closer to the camera or partially occluded. All these improvements let a lightweight CNN reach optimal results while keeping inference time low. An extensive evaluation on a viewpoint estimation benchmark (Pascal3D+) and on actual vehicle camera data (nuScenes) shows that our method significantly outperforms the state-of-the-art in vehicle viewpoint estimation, both in terms of accuracy and memory footprint.

引用

页码：191 / 200

页数：10

共 33 条

[1]

[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4

[2]

[Anonymous], 2018, PATTERN RECOGN LETT, V6, P64270

[3] A Survey on 3D Object Detection Methods for Autonomous Driving Applications [J].

Arnold, Eduardo ;

Al-Jarrah, Omar Y. ;

Dianati, Mehrdad ;

Fallah, Saber ;

Oxtoby, David ;

Mouzakitis, Alex .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (10) :3782-3795

[4]

Beckham C, 2017, PR MACH LEARN RES, V70

[5]

Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339

[6]

Caesar H, 2020, Arxiv, DOI [arXiv:1903.11027, DOI 10.48550/ARXIV.1903.11027]

[7] 3D urban scene modeling integrating recognition and reconstruction [J].

Cornelis, Nico ;

Leibe, Bastian ;

Cornelis, Kurt ;

Van Gool, Luc .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 78 (2-3) :121-141

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9] Viewpoint Estimation-Insights and Model [J].

Divon, Gilad ;

Tal, Ayellet .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :265-281

[10]

Ghodrati A., 2014, PROC BRIT MACH VIS C, P1, DOI [10.5244/C.28.19, DOI 10.5244/C.28.19]

← 1 2 3 4 →