Object detection algorithm based on natural language expression

被引:0
作者
Tian G. [1 ]
Liu H. [1 ]
Bu J. [2 ]
机构
[1] School of Control Science and Engineering, Shandong University, Ji'nan
[2] Electric Power Science Research Institute of State Grid Shandong Electric Power Company, Ji'nan
来源
| 1600年 / Huazhong University of Science and Technology卷 / 45期
关键词
Convolution neural network (CNN); Man-machine interaction; Natural language processing; Object detection; Recurrent neural network (RNN);
D O I
10.13245/j.hust.171021
中图分类号
学科分类号
摘要
To help the robotics localize a target object based on a natural language expression about the target, a fast and end-to-end object detection algorithm based on natural language expression was proposed as follow: a convolution neural network and a recurrent neural network was jointly trained to learn visual and linguistic information. Recurrent neural network was used to encode the natural language expression into a vector representation, and convolution neural network was used to extract the feature of image regions. Comparing those region features to language feature, region with high similarity was the target object. Model was trained in UNC-Ref and G-ref dataset, and showed outperformance in speed and precision. © 2017, Editorial Board of Journal of Huazhong University of Science and Technology. All right reserved.
引用
收藏
页码:111 / 116
页数:5
相关论文
共 14 条
  • [1] Karpathy A., Joulin A., Fei-Fei L., Deep fragment embeddings for bidirectional image sentence mapping, Proc of Conference on Advances in Neural Information Processing Systems, pp. 1889-1897, (2014)
  • [2] Guadarrama S., Rodner E., Saenko K., Et al., Open-vocabulary object retrieval, Proc of Conference on Robotics: Science and Systems, 2, 5, pp. 6-14, (2014)
  • [3] Hu R., Xu H., Rohrbach M., Et al., Natural language object retrieval, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4555-4564, (2016)
  • [4] Mao J., Huang J., Toshev A., Et al., Generation and comp-rehension of unambiguous object descriptions, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11-20, (2016)
  • [5] Ren S., He K., Girshick R., Et al., Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis & Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
  • [6] Girshick R., Fast-rcnn, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1440-1448, (2015)
  • [7] Li Y., He K., Sun J., R-fcn: Object detection via region-based fully convolutional networks, Proc of Conference on Advances in Neural Information Processing Systems, pp. 379-387, (2016)
  • [8] Hochreiter S., Schmidhuber J., Long short-term meomory, Neural Computation, 9, 8, pp. 1735-1780, (1997)
  • [9] Schroff F., Kalenichenko D., Philbin J., Facenet: a unfied embedding for face recognition, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, (2015)
  • [10] Kazemzadeh S., Ordonez V., Mattern M., Et al., ReferIt game: referring to object in photographs of natural scenes, Proc of Conference on Empirical. Methods in Natural Language Processing, pp. 787-798, (2014)