Recurrent self-optimizing proposals for weakly supervised object detection

被引:0
作者
Zhang, Ming [1 ]
Zeng, Bing [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, 2006 Xiyuan Ave, Chengdu 610054, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Weakly supervised object detection; Recurrent self-optimizing proposals; Proposal self-transformation; Proposal self-sampling;
D O I
10.1007/s00521-022-07818-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised object detection (WSOD) has attracted attention increasingly in object detection, as it only requires image-level annotations to train the detector. A typical paradigm for WSOD is to first generate candidate region proposals for the training data, and then each image is treated as a bag of proposals to conduct the training based on the multiple instance learning (MIL). Most methods focus on optimizing the training process, but rarely consider the influence of pre-generated proposals that directly affect the learning of the detector, due to the overwhelming noisy proposals (e.g., negative or background proposals) and positive proposals with inaccurate locations. In this paper, we focus on improving the quality of proposals, and propose a recurrent self-optimizing proposal framework, a new paradigm for WSOD, to iteratively optimize the pre-generated proposals. In each iteration, all detection results (i.e., the object-aware coordinate offsets and the confidence scores) are accumulated for proposal optimization. To achieve accurate object location, we design a proposal self-transformation module to transform the locations of pre-generated proposals based on the coordinate offsets. To alleviate the impact of noise proposals, we design a proposal self-sampling module to mine object instances through confidence scores to filter out noisy proposals. Furthermore, these optimized proposals are fed into a decoupled proposal learner, which contains two parallel proposal training branches. A MIL module and an instance refinement module are supervised by the image label and the mined object instances, respectively. In addition, the instance refinement module contains an instance regression refinement module, which is proposed to generate object-aware coordinate offsets. In turn, the decoupled proposal learner produces the new detection results to optimize proposals in the next iteration. Extensive experiments on PASCAL VOC and MS-COCO datasets demonstrate the effectiveness of our method.
引用
收藏
页码:757 / 771
页数:15
相关论文
共 65 条
[21]  
Jin, 2021, ARXIV
[22]   ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization [J].
Kantorov, Vadim ;
Oquab, Maxime ;
Cho, Minsu ;
Laptev, Ivan .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :350-365
[23]   An enhanced SSD with feature fusion and visual reasoning for object detection [J].
Leng, Jiaxu ;
Liu, Ying .
NEURAL COMPUTING & APPLICATIONS, 2019, 31 (10) :6549-6558
[24]  
Levinson J, 2011, IEEE INT VEH SYM, P163, DOI 10.1109/IVS.2011.5940562
[25]   Deep visual tracking: Review and experimental comparison [J].
Li, Peixia ;
Wang, Dong ;
Wang, Lijun ;
Lu, Huchuan .
PATTERN RECOGNITION, 2018, 76 :323-338
[26]   Weakly Supervised Object Detection With Segmentation Collaboration [J].
Li, Xiaoyan ;
Kan, Meina ;
Shan, Shiguang ;
Chen, Xilin .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9734-9743
[27]  
Lin CH, 2020, AAAI CONF ARTIF INTE, V34, P11482
[28]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[29]   Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment [J].
Liu, Shuai ;
Liu, Xinyu ;
Wang, Shuai ;
Muhammad, Khan .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04) :1055-1065
[30]   SSD: Single Shot MultiBox Detector [J].
Liu, Wei ;
Anguelov, Dragomir ;
Erhan, Dumitru ;
Szegedy, Christian ;
Reed, Scott ;
Fu, Cheng-Yang ;
Berg, Alexander C. .
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37