RILOD: Near Real-Time Incremental Learning for Object Detection at the Edge

被引：60

作者：

Li, Dawei ^{[1
]}

Tasci, Serafettin ^{[1
]}

Ghosh, Shalini ^{[1
]}

Zhu, Jingwen ^{[1
,2
]}

Zhang, Junting ^{[1
,3
]}

Heck, Larry ^{[1
]}

机构：

[1] Samsung Res Amer, Mountain View, CA 94043 USA

[2] Apple Inc, Cupertino, CA 95014 USA

[3] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING | 2019年

关键词：

edge computing; incremental learning; object detection; deep neural networks;

D O I：

10.1145/3318216.3363317

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object detection models shipped with camera-equipped edge devices cannot cover the objects of interest for every user. Therefore, the incremental learning capability is a critical feature for a robust and personalized object detection system that many applications would rely on. In this paper, we present an efficient yet practical system, RILOD, to incrementally train an existing object detection model such that it can detect new object classes without losing its capability to detect old classes. The key component of RILOD is a novel incremental learning algorithm that trains end-to-end for one-stage deep object detection models only using training data of new object classes. Specifically to avoid catastrophic forgetting, the algorithm distills three types of knowledge from the old model to mimic the old model's behavior on object classification, bounding box regression and feature extraction. In addition, since the training data for the new classes may not be available, a real-time dataset construction pipeline is designed to collect training images on-the-fly and automatically label the images with both category and bounding box annotations. We have implemented RILOD under both edge-cloud and edge-only setups. Experiment results show that the proposed system can learn to detect a new object class in just a few minutes, including both dataset construction and model training. In comparison, traditional fine-tuning based method may take a few hours for training, and in most cases would also need a tedious and costly manual dataset labeling step.

引用

页码：113 / 126

页数：14

共 35 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Aljundi R, 2017, ARXIV PREPRINT ARXIV

[3]

[Anonymous], 2015, ABS151000149 CORR

[4]

[Anonymous], 2017, PYTORCH TENSORS DYNA

[5]

[Anonymous], 2016, ARXIV160207360

[6]

Chen G., 2017, NEURIPS

[7]

Wang C, 2009, PROC CVPR IEEE, P1903, DOI [10.1109/CVPRW.2009.5206800, 10.1109/CVPR.2009.5206800]

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9]

Everingham M., The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results

[10] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

← 1 2 3 4 →