Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition

被引:0
|
作者
Wu, Gin-Der [1 ]
Tsai, Hao-Shu [1 ]
机构
[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
来源
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019) | 2019年
关键词
speech recognition; classification; type-2 fuzzy sets; linear-discriminant-analysis; discriminability;
D O I
10.1109/icaiic.2019.8669019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition is an important classification problem in signal processing. Its performance is easily affected by noisy environment due to movements of desks, door slams, etc. To solve the problem, a fuzzy-neural-network based audio-visual fusion is proposed in this study. Since human speech perception is bimodal, the input features include both audio and image information. In the fuzzy-neural-network, type-2 fuzzy sets are used in the antecedent parts to deal with the noisy data. Furthermore, a linear-discriminant-analysis (LDA) is applied in to the consequent parts to increase the "discriminability". Compared with pure audio-based speech recognition, the fuzzy-neural-network based audio-visual fusion method is more robust in noisy environment.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [1] Audio-Visual Multilevel Fusion for Speech and Speaker Recognition
    Chetty, Girija
    Wagner, Michael
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 379 - 382
  • [2] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [3] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [4] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
    Song, Qiya
    Sun, Bin
    Li, Shutao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
  • [5] Decision Level Fusion for Audio-Visual Speech Recognition in Noisy Conditions
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 360 - 367
  • [6] Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion
    Tariquzzaman, Md
    Gyu, Song Min
    Young, Kim Jin
    You, Na Seung
    Rashid, M. A.
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 216 - 219
  • [7] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [8] Noisy Speech Recognition Based on Combined Audio-Visual Classifiers
    Terissi, Lucas D.
    Sad, Gonzalo D.
    Gomez, Juan C.
    Parodi, Marianela
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, 2015, 8869 : 43 - 53
  • [9] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [10] An asynchronous DBN for audio-visual speech recognition
    Saenko, Kate
    Livescu, Karen
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +