VidQ: Video Query Using Optimized Audio-Visual Processing

被引:0
|
作者
Felemban, Noor [1 ]
Mehmeti, Fidan [2 ]
Porta, Thomas F. [3 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Dept Comp Engn, Dammam 34212, Saudi Arabia
[2] Tech Univ Munich, Chair Commun Networks, Munich D-80333, Germany
[3] Penn State Univ, Dept Comp Sci & Engn, State Coll, PA 16801 USA
关键词
Mobile networks; deep learning; convolutional neural networks; performance optimization; heuristics; SPEECH RECOGNITION;
D O I
10.1109/TNET.2022.3215601
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [21] An audio-visual approach to web video categorization
    Bogdan Emanuel Ionescu
    Klaus Seyerlehner
    Ionuţ Mironică
    Constantin Vertan
    Patrick Lambert
    Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
  • [22] Video concept detection by audio-visual grouplets
    Jiang, Wei
    Loui, Alexander C.
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2012, 1 (04) : 223 - 238
  • [23] Kansei enhancement for audio-visual contents using video prodution techniques
    Yamane, S
    Sato, M
    Mouri, T
    Mori, T
    Kasuga, M
    TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A351 - A354
  • [24] Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers
    Hori, Chiori
    Hori, Takaaki
    Le Roux, Jonathan
    INTERSPEECH 2021, 2021, : 586 - 590
  • [25] Somatosensory contribution to audio-visual speech processing
    Ito, Takayuki
    Ohashi, Hiroki
    Gracco, Vincent L.
    CORTEX, 2021, 143 : 195 - 204
  • [26] Some experiments in audio-visual speech processing
    Chollet, G.
    Landais, R.
    Hueber, T.
    Bredin, H.
    Mokbel, C.
    Perrot, P.
    Zouari, L.
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 28 - +
  • [27] AUDIO-VISUAL SPEECH PROCESSING IN OLDER ADULTS
    Burke, K. E.
    Maguinness, C. T.
    Setti, A.
    Kenny, R. A.
    Newell, F. N.
    IRISH JOURNAL OF MEDICAL SCIENCE, 2010, 179 : S124 - S124
  • [28] Audio-visual graphical models for speech processing
    Hershey, J
    Attias, H
    Jojic, N
    Kristjansson, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 649 - 652
  • [29] Audio-visual interaction in the processing of location changes
    Schröger, E
    Widmann, A
    JOURNAL OF PSYCHOPHYSIOLOGY, 1998, 12 (03) : 322 - 323
  • [30] Preattentive processing of audio-visual emotional signals
    Foecker, Julia
    Gondan, Matthias
    Roeder, Brigitte
    ACTA PSYCHOLOGICA, 2011, 137 (01) : 36 - 47