Microsoft COCO: Common Objects in Context

被引:30766
作者
Lin, Tsung-Yi [1 ]
Maire, Michael [2 ]
Belongie, Serge [1 ]
Hays, James [3 ]
Perona, Pietro [2 ]
Ramanan, Deva [4 ]
Dollar, Piotr [5 ]
Zitnick, C. Lawrence [5 ]
机构
[1] Cornell, Ithaca, NY 14850 USA
[2] CALTECH, Pasadena, CA 91125 USA
[3] Brown Univ, Providence, RI 02912 USA
[4] Univ Calif Irvine, Irvine, CA 92717 USA
[5] Microsoft Res, New York, NY USA
来源
COMPUTER VISION - ECCV 2014, PT V | 2014年 / 8693卷
关键词
D O I
10.1007/978-3-319-10602-1_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
引用
收藏
页码:740 / 755
页数:16
相关论文
共 51 条
[41]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[42]  
Ordonez V., 2013, ICCV
[43]  
Palmer S., 1981, Attention and performance, VIX, P4
[44]  
Ramanan D., 2007, CVPR
[45]  
Russakovsky Olga, 2013, ICCV
[46]   LabelMe: A database and web-based tool for image annotation [J].
Russell, Bryan C. ;
Torralba, Antonio ;
Murphy, Kevin P. ;
Freeman, William T. .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 77 (1-3) :157-173
[47]   A taxonomy and evaluation of dense two-frame stereo correspondence algorithms [J].
Scharstein, D ;
Szeliski, R .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2002, 47 (1-3) :7-42
[48]   TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context [J].
Shotton, Jamie ;
Winn, John ;
Rother, Carsten ;
Criminisi, Antonio .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 81 (01) :2-23
[49]   Indoor Segmentation and Support Inference from RGBD Images [J].
Silberman, Nathan ;
Hoiem, Derek ;
Kohli, Pushmeet ;
Fergus, Rob .
COMPUTER VISION - ECCV 2012, PT V, 2012, 7576 :746-760
[50]   80 million tiny images: A large data set for nonparametric object and scene recognition [J].
Torralba, Antonio ;
Fergus, Rob ;
Freeman, William T. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (11) :1958-1970