An Intelligent Retrieval Method for Audio and Video Content: Deep Learning Technology Based on Artificial Intelligence

被引:0
作者
Sun, Maojin [1 ]
机构
[1] CEICloud Data Storage Technol Beijing Co Ltd, Beijing 101111, Peoples R China
关键词
Feature extraction; Accuracy; 5G mobile communication; Deep learning; Visualization; Audio-visual systems; Information retrieval; Information systems; Audio-video content retrieval; deep learning; feature extraction; cross-modal retrieval; intelligent retrieval;
D O I
10.1109/ACCESS.2024.3450920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To address the challenges of efficient intelligent retrieval and cross-modal analysis brought by the surge in audio-video data, this study proposes an intelligent retrieval method for audio-video content based on deep learning techniques, aimed at improving retrieval efficiency and accuracy. This method extracts audio features using the Visual Geometry Group Network (VGG) and employs an adaptive clustering keyframe extraction algorithm (SKM) to extract video features. By integrating cross-learning within an embedding network, it enhances retrieval efficiency and accuracy. The test results on the CMU-MOSEI dataset demonstrate that our method outperforms traditional models such as Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), and state-of-the-art deep learning models like Deep Canonical Correlation Analysis (DCCA) and Domain-Adversarial Neural Network (DANN) in multimodal data processing and real-world retrieval tasks. In video processing, the average fidelity is 0.693, and the average compression ratio is 0.936, representing improvements of 30.75% and 7.09%, respectively, compared to traditional methods. Through the application of deep learning technology, this study not only optimizes the processing of single modalities but also enhances the handling of cross-modal data through a cross-learning framework.
引用
收藏
页码:123430 / 123446
页数:17
相关论文
共 50 条
[11]  
Haav Helemai., 2001, 5th East-European Conference, ADBIS 2001, P29
[12]   A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism [J].
Hao, Qiuyun ;
Wang, FuQiang ;
Ma, XiaoFeng ;
Zhang, Peng .
2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
[13]  
Hu Xinyu, 2023, Proceedings of SPIE, DOI 10.1117/12.2684168
[14]   Junk-neuron-deletion strategy for hyperparameter optimization of neural networks [J].
Huang Ying ;
Gu Chang-Gui ;
Yang Hui-Jie .
ACTA PHYSICA SINICA, 2022, 71 (16)
[15]   Unsupervised Discrete Hashing With Affinity Similarity [J].
Jin, Sheng ;
Yao, Hongxun ;
Zhou, Qin ;
Liu, Yao ;
Huang, Jianqiang ;
Hua, Xiansheng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :6130-6141
[16]   Dynamic Mode Decomposition based salient edge/region features for content based image retrieval [J].
K., Sikha O. ;
P., Soman K. .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) :15937-15958
[17]  
Kervanci I. S., 2023, SAKARYA U J COMPUTER, V6, P1, DOI DOI 10.35377/SAUCIS...1172027
[18]   A novel keyframe extraction method for video classification using deep neural networks [J].
Kiziltepe, Rukiye Savran ;
Gan, John Q. ;
Jose Escobar, Juan .
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (34) :24513-24524
[19]   Reduced Reference Perceptual Quality Model With Application to Rate Control for Video-Based Point Cloud Compression [J].
Liu, Qi ;
Yuan, Hui ;
Hamzaoui, Raouf ;
Su, Honglei ;
Hou, Junhui ;
Yang, Huan .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :6623-6636
[20]   Deep Fuzzy Hashing Network for Efficient Image Retrieval [J].
Lu, Huimin ;
Zhang, Ming ;
Xu, Xing ;
Li, Yujie ;
Shen, Heng Tao .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (01) :166-176