DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware

被引:95
作者
Mathur, Akhil [1 ]
Lane, Nicholas D. [1 ,2 ]
Bhattacharya, Sourav [1 ]
Boran, Aidan [1 ]
Forlivesi, Claudio [1 ]
Kawsar, Fahim [1 ]
机构
[1] Nokia Bell Labs, Berkeley Hts, NJ 07922 USA
[2] UCL, London, England
来源
MOBISYS'17: PROCEEDINGS OF THE 15TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS, AND SERVICES | 2017年
关键词
Wearables; deep learning; embedded devices; computer vision; local execution;
D O I
10.1145/3081333.3081359
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Wearable devices with in-built cameras present interesting opportunities for users to capture various aspects of their daily life and are potentially also useful in supporting users with low vision in their everyday tasks. However, state-of-the-art image wearables available in the market are limited to capturing images periodically and do not provide any real-time analysis of the data that might be useful for the wearers. In this paper, we present DeepEye - a match-box sized wearable camera that is capable of running multiple cloud-scale deep learning models locally on the device, thereby enabling rich analysis of the captured images in near real-time without offloading them to the cloud. DeepEye is powered by a commodity wearable processor (Snapdragon 410) which ensures its wearable form factor. The software architecture for DeepEye addresses a key limitation with executing multiple deep learning models on constrained hardware, that is their limited runtime memory. We propose a novel inference software pipeline that targets the local execution of multiple deep vision models (specifically, CNNs) by interleaving the execution of computation-heavy convolutional layers with the loading of memory-heavy fully-connected layers. Beyond this core idea, the execution framework incorporates: a memory caching scheme and a selective use of model compression techniques that further minimizes memory bottlenecks. Through a series of experiments, we show that our execution framework outperforms the baseline approaches significantly in terms of inference latency, memory requirements and energy consumption.
引用
收藏
页码:68 / 81
页数:14
相关论文
共 40 条
[1]  
[Anonymous], IPSN 2016
[2]  
[Anonymous], CORR
[3]  
[Anonymous], 2014, C COMP VIS PATT REC
[4]  
[Anonymous], P 14 ANN INT C MOB S
[5]  
[Anonymous], ACM C EMB NETW SENS
[6]  
[Anonymous], CORR
[7]  
[Anonymous], INT S COMP ARCH ISCA
[8]  
[Anonymous], 2015, INT C COMP VIS ICCV
[9]  
[Anonymous], 2014, MSRTR201421
[10]  
[Anonymous], P 2 WORKSH SENS SYST