Video Scene Segmentation Using Tensor-Train Faster-RCNN for Multimedia IoT Systems

被引:18
作者
Dai, Cheng [1 ,2 ]
Liu, Xingang [1 ]
Yang, Laurence T. [3 ]
Ni, Minghao [1 ]
Ma, Zhenchao [3 ]
Zhang, Qingchen [3 ]
Deen, M. Jamal [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] McMaster Univ, Dept Elect Engn & Comp Sci, Hamilton, ON L8S 4K1, Canada
[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada
基金
中国国家自然科学基金;
关键词
Tensile stress; Machine learning; Computational modeling; Training; Feature extraction; Image segmentation; Internet of Things; Deep learning; multimedia Internet-of-Things (IoT) system; tensor train; video scene segmentation;
D O I
10.1109/JIOT.2020.3022353
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video surveillance techniques like scene segmentation are playing an increasingly important role in multimedia Internet-of-Things (IoT) systems. However, existing deep learning-based methods face challenges in both accuracy and memory when deployed on edge computing devices with limited computing resources. To address these challenges, a tensor-train video scene segmentation scheme that compares the local background information in regional scene boundary boxes in adjacent frames is proposed. Compared to the existing methods, the proposed scheme can achieve competitive performance in both segmentation accuracy and parameter compression rate. In detail, first, an improved faster region convolutional neural network (faster-RCNN) model is proposed to recognize and generate a large number of region boxes with foreground and background to achieve boundary boxes. Then, the foreground boxes with sparse objects are removed and the rest are considered as optional background boxes used to measure the similarity between two adjacent frames. Second, to accelerate the training efficiency and reduce memory size, a general and efficient training way using tensor-train decomposition to factor the input-to-hidden weight matrix is proposed. Finally, experiments are conducted to evaluate the performance of the proposed scheme in terms of accuracy and model compression. Our results demonstrate that the proposed model can improve the training efficiency and save the memory space for the deep computation model with good accuracy. This work opens the potential for the use of artificial intelligence methods in edge computing devices for multimedia IoT systems.
引用
收藏
页码:9697 / 9705
页数:9
相关论文
共 28 条
[1]   INTERNET-OF-THINGS-BASED SMART ENVIRONMENTS: STATE OF THE ART, TAXONOMY, AND OPEN RESEARCH CHALLENGES [J].
Ahmed, Ejaz ;
Yaqoob, Ibrar ;
Gani, Abdullah ;
Imran, Muhammad ;
Guizani, Mohsen .
IEEE WIRELESS COMMUNICATIONS, 2016, 23 (05) :10-16
[2]  
[Anonymous], 2014, P BRIT MACHINE VISIO
[3]  
Cheng Y., 2020, SURVEY MODEL COMPRES
[4]   A comprehensive survey on model compression and acceleration [J].
Choudhary, Tejalal ;
Mishra, Vipul ;
Goswami, Anurag ;
Sarangapani, Jagannathan .
ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (07) :5113-5155
[5]   Human Behavior Deep Recognition Architecture for Smart City Applications in the 5G Environment [J].
Dai, Cheng ;
Liu, Xingang ;
Lai, Jinfeng ;
Li, Pan ;
Chao, Han-Chieh .
IEEE NETWORK, 2019, 33 (05) :206-211
[6]   Audiovisual integration with Segment Models for tennis video parsing [J].
Delakis, Manolis ;
Gravier, Guillaume ;
Gros, Patrick .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 111 (02) :142-154
[7]  
Denton E. L., 2014, P ADV NEUR INF PROC, P1269
[8]  
Denton E, 2014, ADV NEUR IN, V27
[9]   A survey on deep learning techniques for image and video semantic segmentation [J].
Garcia-Garcia, Alberto ;
Orts-Escolano, Sergio ;
Oprea, Sergiu ;
Villena-Martinez, Victor ;
Martinez-Gonzalez, Pablo ;
Garcia-Rodriguez, Jose .
APPLIED SOFT COMPUTING, 2018, 70 :41-65
[10]  
Geng LP, 2017, PROCEEDINGS OF 2017 7TH IEEE INTERNATIONAL SYMPOSIUM ON MICROWAVE, ANTENNA, PROPAGATION, AND EMC TECHNOLOGIES (MAPE), P168, DOI 10.1109/MAPE.2017.8250824