共 22 条
- [1] DEBNATH S,, RAMALAKSHMI K,, SENBAGAVALLI M., Multimodal authentication system based on audiovisual data:a review [C], Proceedings of 2022 International Conference for Advancement in Technology, pp. 1-5, (2022)
- [2] A multimodal saliency model for videos with high audio-visual correspondence [J]., IEEE Transactions on Image Processing, 29, pp. 3805-3819, (2020)
- [3] ZHANG S X, An overview of deep-learning-based audio-visual speech enhancement and separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, pp. 1368-1396, (2021)
- [4] SUGIYAMA M., Minimum dependency key frames selection via quadratic mutual information [C], Proceedings of 2015 the Tenth International Conference on Digital Information Managemen, pp. 148-153, (2015)
- [5] ZHU Zheng-yu, HE Qian-hua, FENG Xiao-hui, Lip motion and voice consistency algorithm based on fusing spatiotemporal correlation degree [J], Acta Electronica Sinica, 42, 4, pp. 779-785, (2014)
- [6] KUMAR K,, NAVRATIL J, Audio-visual speech synchronization detection using a bimodal linear prediction model [C], Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 53-59, (2009)
- [7] HE Qianhua, ZHU Zhengyu, FENG Xiaohui, Lip motion and voice consistency analysis algorithm based on shift-invariant dictionary, Journal of Huazhong University of Science and Technology(Natural Science Edition), 43, 10, pp. 69-74, (2015)
- [8] CHUNG J S,, ZISSERMAN A., Lip reading in profile [C], Proceedings of 2017 British Machine Vision Conference, pp. 36-46, (2017)
- [9] KIKUCHI T,, OZASA Y., Watch, listen once, and sync: audio-visual synchronization with multi-modal regression CNN [C], Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3036-3040, (2018)
- [10] CHENG S, Towards pose-invariant lip-reading [C], Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing, pp. 4357-4361, (2020)