VidQ: Video Query Using Optimized Audio-Visual Processing

被引:0
作者
Felemban, Noor [1 ]
Mehmeti, Fidan [2 ]
Porta, Thomas F. [3 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Dept Comp Engn, Dammam 34212, Saudi Arabia
[2] Tech Univ Munich, Chair Commun Networks, Munich D-80333, Germany
[3] Penn State Univ, Dept Comp Sci & Engn, State Coll, PA 16801 USA
关键词
Mobile networks; deep learning; convolutional neural networks; performance optimization; heuristics; SPEECH RECOGNITION;
D O I
10.1109/TNET.2022.3215601
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
[31]   FEATURE SPACE VIDEO STREAM CONSISTENCY ESTIMATION FOR DYNAMIC STREAM WEIGHTING IN AUDIO-VISUAL SPEECH RECOGNITION [J].
Terry, Louis H. ;
Shiell, Derek J. ;
Katsaggelos, Aggelos K. .
2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, :1316-1319
[32]   Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video [J].
Serdyuk, Dmitriy ;
Braga, Otavio ;
Siohan, Olivier .
INTERSPEECH 2022, 2022, :2833-2837
[33]   Audio-Visual Speech Modeling for Continuous Speech Recognition [J].
Dupont, Stephane ;
Luettin, Juergen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) :141-151
[34]   System for Producing Subtitles to Internet Audio-Visual Documents [J].
Nouza, Jan ;
Blavka, Karel ;
Bohac, Marek ;
Cerva, Petr ;
Malek, Jiri .
2015 38TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2015,
[35]   Temporal Feature Prediction in Audio-Visual Deepfake Detection [J].
Gao, Yuan ;
Wang, Xuelong ;
Zhang, Yu ;
Zeng, Ping ;
Ma, Yingjie .
ELECTRONICS, 2024, 13 (17)
[36]   Connectionism based audio-visual speech recognition method [J].
Che, Na ;
Zhu, Yi-Ming ;
Zhao, Jian ;
Sun, Lei ;
Shi, Li-Juan ;
Zeng, Xian-Wei .
Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (10) :2984-2993
[37]   An Audio-Visual Attention System for Online Association Learning [J].
Heckmann, Martin ;
Brandl, Holger ;
Domont, Xavier ;
Bolder, Bram ;
Joublin, Frank ;
Goerick, Christian .
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :2127-2130
[38]   Research on Robust Audio-Visual Speech Recognition Algorithms [J].
Yang, Wenfeng ;
Li, Pengyi ;
Yang, Wei ;
Liu, Yuxing ;
He, Yulong ;
Petrosian, Ovanes ;
Davydenko, Aleksandr .
MATHEMATICS, 2023, 11 (07)
[39]   Streaming Audio-Visual Speech Recognition with Alignment Regularization [J].
Ma, Pingchuan ;
Moritz, Niko ;
Petridis, Stavros ;
Fuegen, Christian ;
Pantic, Maja .
INTERSPEECH 2023, 2023, :1598-1602
[40]   Optimality and Limitations of Audio-Visual Integration for Cognitive Systems [J].
Boyce, William Paul ;
Lindsay, Anthony ;
Zgonnikov, Arkady ;
Rano, Inaki ;
Wong-Lin, KongFatt .
FRONTIERS IN ROBOTICS AND AI, 2020, 7