Performance evaluation of deep feature learning for RGB-D image/video classification

被引：65

作者：

Shao, Ling ^{[1
,2
]}

Cai, Ziyun ^{[3
]}

Liu, Li ^{[2
]}

Lu, Ke ^{[4
,5
]}

机构：

[1] Nanjing Univ Informat Sci & Technol, Coll Elect & Informat Engn, Nanjing 210044, Jiangsu, Peoples R China

[2] Univ East Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England

[3] Univ Sheffield, Dept Elect & Elect Engn, Mappin St, Sheffield S1 3JD, S Yorkshire, England

[4] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[5] Beijing Ctr Math & Informat Interdisciplinary Sci, Beijing, Peoples R China

来源：

INFORMATION SCIENCES | 2017年 / 385卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Deep neural networks; RGB-D data; Feature learning; Performance evaluation; RECOGNITION;

D O I：

10.1016/j.ins.2017.01.013

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks for image/video classification have obtained much success in various computer vision applications. Existing deep learning algorithms are widely used on RGB images or video data. Meanwhile, with the development of low-cost RGB-D sensors (such as Microsoft Kinect and Xtion Pro-Live), high-quality RGB-D data can be easily acquired and used to enhance computer vision algorithms [14]. It would be interesting to investigate how deep learning can be employed for extracting and fusing features from RGB-D data. In this paper, after briefly reviewing the basic concepts of RGB-D information and four prevalent deep learning models (i.e., Deep Belief Networks (DBNs), Stacked Denoising Auto-Encoders (SDAE), Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM) Neural Networks), we conduct extensive experiments on five popular RGB-D datasets including three image datasets and two video datasets. We then present a detailed analysis about the comparison between the learned feature representations from the four deep learning models. In addition, a few suggestions on how to adjust hyper parameters for learning deep neural networks are made in this paper. According to the extensive experimental results, we believe that this evaluation will provide insights and a deeper understanding of different deep learning algorithms for RGB-D feature extraction and fusion. (C) 2017 Elsevier Inc. All rights reserved.

引用

页码：266 / 283

页数：18

共 50 条

[1]

Allen Peter K, 2012, ROBOTIC OBJECT RECOG, V34

[2]

[Anonymous], 2012, P 26 ANN C NEUR PROC, DOI DOI 10.1002/2014GB005021

[3]

[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59

[4]

[Anonymous], 2014, ARXIV14021128CSSTAT

[5]

[Anonymous], 2006, NOTES CONVOLUTIONAL

[6]

[Anonymous], ARXIV PREPRINT ARXIV

[7]

[Anonymous], 2015, PROC CVPR IEEE, DOI 10.1109/CVPR.2015.7298801

[8]

[Anonymous], 2013, P 23 INT JOINT C ART

[9]

[Anonymous], 1997, Neural Computation

[10]

[Anonymous], P WORKSH RGB D ADV R

← 1 2 3 4 5 →