An Intelligent Retrieval Method for Audio and Video Content: Deep Learning Technology Based on Artificial Intelligence

被引：0

作者：

Sun, Maojin ^{[1
]}

机构：

[1] CEICloud Data Storage Technol Beijing Co Ltd, Beijing 101111, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Accuracy; 5G mobile communication; Deep learning; Visualization; Audio-visual systems; Information retrieval; Information systems; Audio-video content retrieval; deep learning; feature extraction; cross-modal retrieval; intelligent retrieval;

D O I：

10.1109/ACCESS.2024.3450920

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To address the challenges of efficient intelligent retrieval and cross-modal analysis brought by the surge in audio-video data, this study proposes an intelligent retrieval method for audio-video content based on deep learning techniques, aimed at improving retrieval efficiency and accuracy. This method extracts audio features using the Visual Geometry Group Network (VGG) and employs an adaptive clustering keyframe extraction algorithm (SKM) to extract video features. By integrating cross-learning within an embedding network, it enhances retrieval efficiency and accuracy. The test results on the CMU-MOSEI dataset demonstrate that our method outperforms traditional models such as Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), and state-of-the-art deep learning models like Deep Canonical Correlation Analysis (DCCA) and Domain-Adversarial Neural Network (DANN) in multimodal data processing and real-world retrieval tasks. In video processing, the average fidelity is 0.693, and the average compression ratio is 0.936, representing improvements of 30.75% and 7.09%, respectively, compared to traditional methods. Through the application of deep learning technology, this study not only optimizes the processing of single modalities but also enhances the handling of cross-modal data through a cross-learning framework.

引用

页码：123430 / 123446

页数：17

共 50 条

[1] APES: Audiovisual Person Search in Untrimmed Video
Alcazar, Juan Leon
Heilbron, Fabian Caba
Mai, Long
Perazzi, Federico
Lee, Joon-Young
Arbelaez, Pablo
Ghanem, Bernard
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1720 - 1729
[2] Semantic video segmentation with dynamic keyframe selection and distortion-aware feature rectification
Awan, Mehwish
Shin, Jitae
[J]. IMAGE AND VISION COMPUTING, 2021, 110
[3] The Virage image search engine: An open framework for image management
Bach, JR
Fuller, C
Gupta, A
Hampapur, A
Horowitz, B
Humphrey, R
Jain, R
Shu, CF
[J]. STORAGE AND RETRIEVAL FOR STILL IMAGE AND VIDEO DATABASES IV, 1996, 2670 : 76 - 87
[4] Carvalho L, 2023, Arxiv, DOI arXiv:2309.12111
[5] A Flow Feedback Traffic Prediction Based on Visual Quantified Features
Chen, Jing
Xu, Mengqi
Xu, Wenqiang
Li, Daping
Peng, Weimin
Xu, Haitao
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 10067 - 10075
[6] Disparity-Based Multiscale Fusion Network for Transportation Detection
Chen, Jing
Wang, Qichao
Peng, Weiming
Xu, Haitao
Li, Xiaodong
Xu, Wenqiang
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 18855 - 18863
[7] Content-based search of video using color, texture, and motion
Deng, Y
Manjunath, BS
[J]. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL II, 1997, : 534 - 537
[8] Content quotas and prominence on VOD services:new challenges for audiovisual european regulators
Farchy, Joelle
Bideau, Gregoire
Tallec, Steven
[J]. INTERNATIONAL JOURNAL OF CULTURAL POLICY, 2022, 28 (04) : 419 - 430
[9] Fish E, 2023, Arxiv, DOI arXiv:2310.03456
[10] EasyFlinkCEP: Big Event Data Analytics for Everyone
Giatrakos, Nikos
Kougioumtzi, Eleni
Kontaxakis, Antonios
Deligiannakis, Antonios
Kotidis, Yannis
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3029 - 3033

← 1 2 3 4 5 →