An Improved SSD-Like Deep Network-Based Object Detection Method for Indoor Scenes

被引：38

作者：

Ni, Jianjun ^{[1
]}

Shen, Kang ^{[1
]}

Chen, Yan ^{[1
]}

Yang, Simon X. ^{[2
]}

机构：

[1] Hohai Univ, Coll Internet Things Engn, Changzhou 213022, Jiangsu, Peoples R China

[2] Univ Guelph, Sch Engn, Adv Robot & Intelligent Syst ARIS Lab, Guelph, ON N1G 2W1, Canada

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Feature extraction; Robots; Deep learning; Task analysis; Lighting; Data mining; Deep network; indoor scene; object detection; ResNet50; network; single-shot multibox detector (SSD) algorithm; RECOGNITION; NAVIGATION;

D O I：

10.1109/TIM.2023.3244819

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The indoor scene object detection technology is of important research significance, which is one of the popular research topics in the field of scene understanding for indoor robots. In recent years, the solutions based on deep learning have achieved good results in object detection. However, there are still some problems to be further studied in indoor object detection methods, such as lighting problem and occlusion problem caused by the complexity of the indoor environment. Aiming at these problems, an improved object detection method based on deep neural networks is proposed in this article, which uses a framework similar to the single-shot multibox detector (SSD). In the proposed method, an improved ResNet50 network is used to enhance the transmission of information, and the feature expression capability of the feature extraction network is improved. At the same time, a multiscale contextual information extraction (MCIE) module is used to extract the contextual information of the indoor scene, so as to improve the indoor object detection effect. In addition, an improved dual-threshold non-maximum suppression (DT-NMS) algorithm is used to alleviate the occlusion problem in indoor scenes. Finally, the public dataset SUN2012 is further screened for the special application of indoor scene object detection, and the proposed method is tested on this dataset. The experimental results show that the mean average precision (mAP) of the proposed method can reach 54.10%, which is higher than those of the state-of-the-art methods.

引用

页数：15

共 69 条

[21] Effective Fusion Factor in FPN for Tiny Object Detection
Gong, Yuqi
Yu, Xuehui
Ding, Yao
Peng, Xiaoke
Zhao, Jian
Han, Zhenjun
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1159 - 1167
[22] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[23] GeoRec: Geometry-enhanced semantic 3D reconstruction of RGB-D indoor scenes
Huan, Linxi
Zheng, Xianwei
Gong, Jianya
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 301 - 314
[24] Multi-Scale Feature Fusion Convolutional Neural Network for Indoor Small Target Detection
Huang, Li
Chen, Cheng
Yun, Juntong
Sun, Ying
Tian, Jinrong
Hao, Zhiqiang
Yu, Hui
Ma, Hongjie
[J]. FRONTIERS IN NEUROROBOTICS, 2022, 16
[25] Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving With Scene Understanding
Huang, Zhiyu
Lv, Chen
Xing, Yang
Wu, Jingda
[J]. IEEE SENSORS JOURNAL, 2021, 21 (10) : 11781 - 11790
[26] Ioffe S, 2015, PR MACH LEARN RES, V37, P448
[27] Research on Indoor Scene Classification Mechanism Based on Multiple Descriptors Fusion
Ji, Ping
Qin, Danyang
Feng, Pan
Lan, Tingting
Sun, Guanyu
[J]. MOBILE INFORMATION SYSTEMS, 2020, 2020
[28] A HIERARCHICAL INFERENTIAL METHOD FOR INDOOR SCENE CLASSIFICATION
Jiang, Jingzhe
Liu, Peng
Ye, Zhipeng
Zhao, Wei
Tang, Xianglong
[J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 27 (04) : 839 - 852
[29] High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review
Li, Jianwei
Gao, Wei
Wu, Yihong
Liu, Yangdong
Shen, Yanfei
[J]. COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) : 369 - 393
[30] MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
Li, Yabei
Zhang, Zhang
Cheng, Yanhua
Wang, Liang
Tan, Tieniu
[J]. PATTERN RECOGNITION, 2019, 90 : 436 - 449

← 1 2 3 4 5 6 7 →