InstaBoost plus plus : Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation

被引:3
作者
Sun, Jianhua [1 ]
Fang, Hao-Shu [1 ]
Li, Yuxuan [1 ]
Wang, Runzhong [1 ]
Gou, Minghao [1 ]
Lu, Cewu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Dongchuan Rd, Shanghai 201100, Peoples R China
关键词
Data augmentation; Visual coherence; Object detection; Instance segmentation; 3D detection; OBJECT; SEARCH;
D O I
10.1007/s11263-023-01807-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instance-level perception tasks like object detection, instance segmentation, and 3D detection require many training samples to achieve satisfactory performance. The meticulous labels for these tasks are usually expensive to obtain and data augmentation is a natural choice to tackle such a problem. However, instance-level augmentation is less studied in previous research. In this paper, we present an effective, efficient and unified crop-paste mechanism to augment the training set utilizing existing instance-level annotations. Our design is derived from visual coherence and mines three inherent principles that widely exist in real-world data: (i) background coherence in local neighbor area, (ii) appearance coherence for instance placement, and (iii) instance coherence within the same category. Such methodologies are unified for various tasks including object detection, instance segmentation, and 3D detection. Extensive experiments demonstrate that our proposed approaches can successfully boost the performance of diverse frameworks on various datasets across multiple tasks, without modifying the network structure. Remarkable improvements are obtained: 5.1 mAP for object detection and 3.2 mAP for instance segmentation on COCO dataset, and 6.9 mAP for 3D detection on ScanNetV2 dataset. Our method can be easily integrated into different frameworks without affecting the training and inference efficiency.
引用
收藏
页码:2665 / 2681
页数:17
相关论文
共 103 条
[61]   Libra R-CNN: Towards Balanced Learning for Object Detection [J].
Pang, Jiangmiao ;
Chen, Kai ;
Shi, Jianping ;
Feng, Huajun ;
Ouyang, Wanli ;
Lin, Dahua .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :821-830
[62]  
Pinheiro PO, 2015, ADV NEUR IN, V28
[63]   Deep Hough Voting for 3D Object Detection in Point Clouds [J].
Qi, Charles R. ;
Litany, Or ;
He, Kaiming ;
Guibas, Leonidas J. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9276-9285
[64]   Learning Human-Object Interactions by Graph Parsing Neural Networks [J].
Qi, Siyuan ;
Wang, Wenguan ;
Jia, Baoxiong ;
Shen, Jianbing ;
Zhu, Song-Chun .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :407-423
[65]   Learning to Segment via Cut-and-Paste [J].
Remez, Tal ;
Huang, Jonathan ;
Brown, Matthew .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :39-54
[66]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[67]  
Richtsfeld A, 2012, IEEE INT C INT ROBOT, P4791, DOI 10.1109/IROS.2012.6385661
[68]   Find and Focus: Retrieve and Localize Video Events with Natural Language Queries [J].
Shao, Dian ;
Xiong, Yu ;
Zhao, Yue ;
Huang, Qingqiu ;
Qiao, Yu ;
Lin, Dahua .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :202-218
[69]   PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [J].
Shi, Shaoshuai ;
Guo, Chaoxu ;
Jiang, Li ;
Wang, Zhe ;
Shi, Jianping ;
Wang, Xiaogang ;
Li, Hongsheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10526-10535
[70]   PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud [J].
Shi, Shaoshuai ;
Wang, Xiaogang ;
Li, Hongsheng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :770-779