Audio-Visual Segmentation based on robust principal component analysis

被引:1
作者
Fang, Shun [1 ,2 ]
Zhu, Qile [1 ,2 ]
Wu, Qi [3 ]
Wu, Shiqian [1 ,2 ,4 ]
Xie, Shoulie [5 ]
机构
[1] Wuhan Univ Sci & Technol, Sch Informat Sci & Engn, Wuhan, Peoples R China
[2] Wuhan Univ Sci & Technol, Inst Robot & Intelligent Syst, Wuhan, Peoples R China
[3] Jiangxi Univ Finance & Econ, Sch Software & Internet things Engn, Nanchang, Peoples R China
[4] Henan Acad Sci, Inst Adv Displays & Imaging, Zhengzhou, Peoples R China
[5] RF & Opt Dept Inst Infocomm Res A STAR, Signal Proc, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
Audio-Visual Segmentation; Robust principal component analysis; Unsupervised learning;
D O I
10.1016/j.eswa.2024.124885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-Visual Segmentation (AVS) aims to extract the sounding objects from a video. The current learning- based AVS methods are often supervised, which rely on specific task data annotations and expensive model training. Recognizing that the video background captured by a static camera is represented as a low-rank matrix, we introduce the non-convex robust principal component analysis into AVS task in this paper. This approach is unsupervised and only relies on input data patterns. Specifically, the proposed method decomposes each modality into the sum of two parts: the low-rank part that represents the background audio and visual information, and the sparse part that represents the foreground information. Furthermore, CUR decomposition is employed at each iteration to reduce the computational complexity in optimization. The experimental results also show that the proposed AVS outperforms the supervised methods on AVS-Bench Single-Source datasets.
引用
收藏
页数:10
相关论文
共 49 条
[1]  
Bhosale S, 2024, Arxiv, DOI arXiv:2403.14203
[2]  
Bhosale S, 2023, Arxiv, DOI arXiv:2309.06728
[3]   Optimal CUR Matrix Decompositions [J].
Boutsidis, Christos ;
Woodruff, David P. .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :353-362
[4]   On the Applications of Robust PCA in Image and Video Processing [J].
Bouwmans, Thierry ;
Javed, Sajid ;
Zhang, Hongyang ;
Lin, Zhouchen ;
Otazo, Ricardo .
PROCEEDINGS OF THE IEEE, 2018, 106 (08) :1427-1457
[5]   Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation [J].
Cai, HanQin ;
Hamm, Keaton ;
Huang, Longxiu ;
Li, Jiaqi ;
Wang, Tao .
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 :116-120
[6]   Robust Principal Component Analysis? [J].
Candes, Emmanuel J. ;
Li, Xiaodong ;
Ma, Yi ;
Wright, John .
JOURNAL OF THE ACM, 2011, 58 (03)
[7]   Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos [J].
Chen, Brian ;
Rouditchenko, Andrew ;
Duarte, Kevin ;
Kuehne, Hilde ;
Thomas, Samuel ;
Boggust, Angie ;
Panda, Rameswar ;
Kingsbury, Brian ;
Feris, Rogerio ;
Harwath, David ;
Glass, James ;
Picheny, Michael ;
Chang, Shih-Fu .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7992-8001
[8]   Localizing Visual Sounds the Hard Way [J].
Chen, Honglie ;
Xie, Weidi ;
Afouras, Triantafyllos ;
Nagrani, Arsha ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16862-16871
[9]  
Chen Y., 2023, ARXIV
[10]   Efficient Robust Principal Component Analysis via Block Krylov Iteration and CUR Decomposition [J].
Fang, Shun ;
Xu, Zhengqin ;
Wu, Shiqian ;
Xie, Shoulie .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :1348-1357