Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition

被引：0

作者：

Wu, Gin-Der ^{[1
]}

Tsai, Hao-Shu ^{[1
]}

机构：

[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan

来源：

2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019) | 2019年

关键词：

speech recognition; classification; type-2 fuzzy sets; linear-discriminant-analysis; discriminability;

D O I：

10.1109/icaiic.2019.8669019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition is an important classification problem in signal processing. Its performance is easily affected by noisy environment due to movements of desks, door slams, etc. To solve the problem, a fuzzy-neural-network based audio-visual fusion is proposed in this study. Since human speech perception is bimodal, the input features include both audio and image information. In the fuzzy-neural-network, type-2 fuzzy sets are used in the antecedent parts to deal with the noisy data. Furthermore, a linear-discriminant-analysis (LDA) is applied in to the consequent parts to increase the "discriminability". Compared with pure audio-based speech recognition, the fuzzy-neural-network based audio-visual fusion method is more robust in noisy environment.

引用

页码：210 / 214

页数：5

共 50 条

[31] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
Yoshida, Takami
Nakadai, Kazuhiro
Okuno, Hiroshi G.
TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
[32] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
Yu, Jianwei
Zhang, Shi-Xiong
Wu, Bo
Liu, Shansong
Hu, Shoukang
Geng, Mengzhe
Liu, Xunying
Meng, Helen
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
[33] Robot Command Interface Using an Audio-Visual Speech Recognition System
Ceballos, Alexander
Gomez, Juan
Prieto, Flavio
Redarce, Tanneguy
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 869 - +
[34] Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin
Kashevnik, Alexey
Lashkov, Igor
Axyonov, Alexandr
Ivanko, Denis
Ryumin, Dmitry
Kolchin, Artem
Karpov, Alexey
IEEE ACCESS, 2021, 9 : 34986 - 35003
[35] Multi-Stream Asynchrony Dynamic Bayesian Network model for audio-visual continuous speech recognition
Lv, Guoyun
Jiang, Dongmei
Zhao, Rongchun
Jiang, Xiaoyue
Sahli, H.
2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 170 - +
[36] COMPARISON BETWEEN DIFFERENT FEATURE EXTRACTION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
Chitu, Alin G.
Rothkrantz, Leon J. M.
Wiggers, Pascal
Wojdel, Jacek C.
JOURNAL ON MULTIMODAL USER INTERFACES, 2007, 1 (01) : 7 - 20
[37] Face-to-talk: Audio-visual speech detection for robust speech recognition in noisy environment
Murai, K
Nakamura, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 505 - 513
[38] Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis
Debnath, Saswati
Roy, Pinki
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 121 - 133
[39] Audio-Visual Tensor Fusion Network for Piano Player Posture Classification
Park, So-Hyun
Park, Young-Ho
APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 15
[40] THE NEW DELFT UNIVERSITY OF TECHNOLOGY DATA CORPUS FOR AUDIO-VISUAL SPEECH RECOGNITION
Chitu, Alin G.
Rothkrantz, Leon J. M.
EUROMEDIA'2009, 2009, : 63 - 69

← 1 2 3 4 5 →