Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition

被引:0
|
作者
Wu, Gin-Der [1 ]
Tsai, Hao-Shu [1 ]
机构
[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
来源
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019) | 2019年
关键词
speech recognition; classification; type-2 fuzzy sets; linear-discriminant-analysis; discriminability;
D O I
10.1109/icaiic.2019.8669019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition is an important classification problem in signal processing. Its performance is easily affected by noisy environment due to movements of desks, door slams, etc. To solve the problem, a fuzzy-neural-network based audio-visual fusion is proposed in this study. Since human speech perception is bimodal, the input features include both audio and image information. In the fuzzy-neural-network, type-2 fuzzy sets are used in the antecedent parts to deal with the noisy data. Furthermore, a linear-discriminant-analysis (LDA) is applied in to the consequent parts to increase the "discriminability". Compared with pure audio-based speech recognition, the fuzzy-neural-network based audio-visual fusion method is more robust in noisy environment.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [41] Comparison between different feature extraction techniques for audio-visual speech recognition
    Alin G. Chiţu
    Leon J. M. Rothkrantz
    Pascal Wiggers
    Jacek C. Wojdel
    Journal on Multimodal User Interfaces, 2007, 1 : 7 - 20
  • [42] AUDIO-VISUAL SPEECH RECOGNITION INCORPORATING FACIAL DEPTH INFORMATION CAPTURED BY THE KINECT
    Galatas, Georgios
    Potamianos, Gerasimos
    Makedon, Fillia
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2714 - 2717
  • [43] Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition
    Su, Rongfeng
    Liu, Xunying
    Wang, Lan
    Yang, Jingzhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 185 - 197
  • [44] Multiple camera in car audio-visual speech recognition using phonetic and visemic information
    Biswas, Astik
    Sahu, P. K.
    Chandra, Mahesh
    COMPUTERS & ELECTRICAL ENGINEERING, 2015, 47 : 35 - 50
  • [45] Matrix-MCE Based Fuzzy Neural Network for Speech Recognition
    Wu, Gin-Der
    Zhu, Zhen-Wei
    11TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2014, : 546 - 550
  • [46] Audio-visual Integration for Robust Speech Recognition Using Maximum Weighted Stream Posteriors
    Seymour, Rowan
    Stewart, Darryl
    Ming, Ji
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 869 - 872
  • [47] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
    Anwar, Mohamed
    Shi, Bowen
    Goswami, Vedanuj
    Hsu, Wei-Ning
    Pino, Juan
    Wang, Changhan
    INTERSPEECH 2023, 2023, : 4064 - 4068
  • [48] Biometric person authentication with liveness detection based on audio-visual fusion
    Chetty, Girija
    Wagner, Michael
    INTERNATIONAL JOURNAL OF BIOMETRICS, 2009, 1 (04) : 463 - 478
  • [49] Dual-Discriminability-Analysis Type-2 Fuzzy-Neural-Network Based Speech Classification for Human-Machine Interaction
    Wu, Gin-Der
    Zhu, Zhen-Wei
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2016, 32 (04) : 831 - 847
  • [50] Statistical multimodal integration for audio-visual speech processing
    Nakamura, S
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04): : 854 - 866