Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition

被引:0
|
作者
Wu, Gin-Der [1 ]
Tsai, Hao-Shu [1 ]
机构
[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
来源
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019) | 2019年
关键词
speech recognition; classification; type-2 fuzzy sets; linear-discriminant-analysis; discriminability;
D O I
10.1109/icaiic.2019.8669019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition is an important classification problem in signal processing. Its performance is easily affected by noisy environment due to movements of desks, door slams, etc. To solve the problem, a fuzzy-neural-network based audio-visual fusion is proposed in this study. Since human speech perception is bimodal, the input features include both audio and image information. In the fuzzy-neural-network, type-2 fuzzy sets are used in the antecedent parts to deal with the noisy data. Furthermore, a linear-discriminant-analysis (LDA) is applied in to the consequent parts to increase the "discriminability". Compared with pure audio-based speech recognition, the fuzzy-neural-network based audio-visual fusion method is more robust in noisy environment.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [31] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
    Yoshida, Takami
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
  • [32] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
    Yu, Jianwei
    Zhang, Shi-Xiong
    Wu, Bo
    Liu, Shansong
    Hu, Shoukang
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
  • [33] Robot Command Interface Using an Audio-Visual Speech Recognition System
    Ceballos, Alexander
    Gomez, Juan
    Prieto, Flavio
    Redarce, Tanneguy
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 869 - +
  • [34] Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin
    Kashevnik, Alexey
    Lashkov, Igor
    Axyonov, Alexandr
    Ivanko, Denis
    Ryumin, Dmitry
    Kolchin, Artem
    Karpov, Alexey
    IEEE ACCESS, 2021, 9 : 34986 - 35003
  • [35] Multi-Stream Asynchrony Dynamic Bayesian Network model for audio-visual continuous speech recognition
    Lv, Guoyun
    Jiang, Dongmei
    Zhao, Rongchun
    Jiang, Xiaoyue
    Sahli, H.
    2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 170 - +
  • [36] COMPARISON BETWEEN DIFFERENT FEATURE EXTRACTION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    Wiggers, Pascal
    Wojdel, Jacek C.
    JOURNAL ON MULTIMODAL USER INTERFACES, 2007, 1 (01) : 7 - 20
  • [37] Face-to-talk: Audio-visual speech detection for robust speech recognition in noisy environment
    Murai, K
    Nakamura, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 505 - 513
  • [38] Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis
    Debnath, Saswati
    Roy, Pinki
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 121 - 133
  • [39] Audio-Visual Tensor Fusion Network for Piano Player Posture Classification
    Park, So-Hyun
    Park, Young-Ho
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 15
  • [40] THE NEW DELFT UNIVERSITY OF TECHNOLOGY DATA CORPUS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    EUROMEDIA'2009, 2009, : 63 - 69