RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

被引：39

作者：

Gao, Mingliang ^{[1
,2
]}

Jiang, Jun ^{[3
,4
]}

Zou, Guofeng ^{[1
]}

John, Vijay ^{[5
]}

Liu, Zheng ^{[2
]}

机构：

[1] Shandong Univ Technol, Sch Elect & Elect Engn, Zibo 255000, Peoples R China

[2] Univ British Columbia, Fac Appl Sci, Vancouver, BC V1V 1V7, Canada

[3] Southwest Petr Univ, Sch Comp Sci & Technol, Chengdu 610500, Sichuan, Peoples R China

[4] Sichuan Univ Sci & Engn, Sch Comp Sci, Zigong 643000, Peoples R China

[5] Toyota Technol Inst, Intelligent Informat Proc Lab, Nagoya, Aichi 4688511, Japan

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Convolutional neural network; multimodal fusion; object recognition; RGB-D; survey; EXTREME LEARNING-MACHINE; CANONICAL CORRELATION-ANALYSIS; DATA FUSION; REPRESENTATION; CLASSIFICATION; HISTOGRAMS; FEATURES; SCENE; REGRESSION; MATRIX;

D O I：

10.1109/ACCESS.2019.2907071

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object recognition in real-world environments is one of the fundamental and key tasks in computer vision and robotics communities. With the advanced sensing technologies and low-cost depth sensors, the high-quality RGB and depth images can be recorded synchronously, and the object recognition performance can be improved by jointly exploiting them. RGB-D-based object recognition has evolved from early methods that using hand-crafted representations to the current state-of-the-art deep learning-based methods. With the undeniable success of deep learning, especially convolutional neural networks (CNNs) in the visual domain, the natural progression of deep learning research points to problems involving larger and more complex multimodal data. In this paper, we provide a comprehensive survey of recent multimodal CNNs (MMCNNs)-based approaches that have demonstrated significant improvements over previous methods. We highlight two key issues, namely, training data deficiency and multimodal fusion. In addition, we summarize and discuss the publicly available RGB-D object recognition datasets and present a comparative performance evaluation of the proposed methods on these benchmark datasets. Finally, we identify promising avenues of research in this rapidly evolving field. This survey will not only enable researchers to get a good overview of the state-of-the-art methods for RGB-D-based object recognition but also provide a reference for other multimodal machine learning applications, e.g., multimodal medical image fusion, audio-visual speech recognition, and multimedia retrieval and generation.

引用

页码：43110 / 43136

页数：27

共 216 条

[1]

Agrawal P, 2014, LECT NOTES COMPUT SC, V8695, P329, DOI 10.1007/978-3-319-10584-0_22

[2] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].

Aharon, Michal ;

Elad, Michael ;

Bruckstein, Alfred .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322

[3] 50 Years of object recognition: Directions forward [J].

Andreopoulos, Alexander ;

Tsotsos, John K. .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (08) :827-891

[4]

[Anonymous], IEEE T PATTERN ANAL

[5]

[Anonymous], IEEE T CYBERN

[6]

[Anonymous], 2018, RECURRENT CONVOLUTIO

[7]

[Anonymous], CORRELATED INDIVIDUA

[8]

[Anonymous], 2017, IEEE International Conference on Image Processing Theory, Tools and Applications

[9]

[Anonymous], 2009, IEEE INT C ROB AUT, DOI DOI 10.1109/R0B0T.2009.5152473

[10]

[Anonymous], IEEE ACCESS

← 1 2 3 4 5 6 7 8 9 10 →