Weakly-Supervised TV Logo Detection

被引：0

作者：

Zhang, Yueying ^{[1
,2
]}

Cao, Xiaochun ^{[1
,2
]}

Wu, Dao ^{[1
,2
]}

Li, Tao ^{[3
]}

机构：

[1] Chinese Acad Sci, State Key Lab Informat Secur, Inst Informat Engn, Beijing 100093, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100049, Peoples R China

[3] Sichuan Univ, Coll Cybersecur, Chengdu 610207, Sichuan, Peoples R China

来源：

2017 32ND YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC) | 2017年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

TV logo detection; Weakly-supervised; Faster RCNN; RPN; Fast RCNN;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a TV logo detection system is proposed based on the deep learning architecture for the specific TV logo detection task. Training a robust object detector typically requires a large amount of manually annotated data, which is time-consuming. To reduce the cost, we construct a TV logo detection system in a weakly-supervised framework, which is accomplished by a TV logo localization network based on Region Proposal Network (RPN) and a classification network based on Fast RCNN. Based on observed priors of a typical TV logo in pictures and video frames, data preparation and processing are performed by carrying out keyframe extraction and data augmentation. Since we build the localization network based on RPN, only a few bounding box annotations are employed for training the localization network. Then the well-trained localization network can produce numerous positive and negative proposals. These proposals along with the logo class labels for classification network training are exploited to train the classification network. To generate reasonable anchor boxes, k-means clustering is utilized to infer the scales and aspect ratios. Besides, for efficient training and better generalization ability, hard example mining is also explored. Experimental results demonstrate that the proposed weakly-supervised TV logo detection system achieves superior performances compared to the baseline Faster RCNN approach, with a mAP as about 92% in our newly proposed dataset.

引用

页码：1031 / 1036

页数：6

共 19 条

[1]

Carlin Bradley P., 1993, SIAM REV, V35, P146

[2] Object-Proposal Evaluation Protocol is 'Gameable' [J].

Chavali, Neelima ;

Agrawal, Harsh ;

Mahendru, Aroma ;

Batra, Dhruv .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :835-844

[3] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[4]

Girshick R., 2014, IEEE C COMP VIS PATT, DOI [DOI 10.1109/CVPR.2014.81, 10.1109/CVPR.2014.81]

[5] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[6]

He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]

[7] Caffe: Convolutional Architecture for Fast Feature Embedding [J].

Jia, Yangqing ;

Shelhamer, Evan ;

Donahue, Jeff ;

Karayev, Sergey ;

Long, Jonathan ;

Girshick, Ross ;

Guadarrama, Sergio ;

Darrell, Trevor .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678

[8]

Joseph RK, 2016, CRIT POL ECON S ASIA, P1

[9] Large-Scale Video Summarization Using Web-Image Priors [J].

Khosla, Aditya ;

Hamid, Raffay ;

Lin, Chih-Jen ;

Sundaresan, Neel .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2698-2705

[10] Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction [J].

Kim, Gunhee ;

Sigal, Leonid ;

Xing, Eric P. .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :4225-4232

← 1 2 →