共 50 条
- [23] A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition 2007 SECOND INTERNATIONAL CONFERENCE IN COMMUNICATIONS AND NETWORKING IN CHINA, VOLS 1 AND 2, 2007, : 1044 - +
- [24] AUDIO-VISUAL KEYWORD SPOTTING BASED ON MULTIDIMENSIONAL CONVOLUTIONAL NEURAL NETWORK 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 4138 - 4142
- [25] Speech enhancement and recognition in meetings with an audio-visual sensor array IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269
- [26] Transfer Learning from Audio-Visual Grounding to Speech Recognition INTERSPEECH 2019, 2019, : 3242 - 3246
- [27] A Robust Feature Extraction with Dual Fusion aided Extreme Learning for Audio-Visual Hindi Speech Recognition JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (05): : 383 - 386
- [28] Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2241 - 2244
- [29] AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio-Visual Speech Recognition APPLIED SCIENCES-BASEL, 2025, 15 (01):