The early maximum likelihood estimation model of audiovisual integration in speech perception

被引:9
作者
Andersen, Tobias S. [1 ]
机构
[1] Tech Univ Denmark, Sect Cognit Syst, Dept Appl Math & Comp Sci, DK-2800 Lyngby, Denmark
关键词
FUZZY-LOGICAL MODEL; CROSSMODAL INTEGRATION; GOOD FIT; INFORMATION; FUSION;
D O I
10.1121/1.4916691
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk - MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely focused on the fuzzy logical model of perception (FLMP), which provides excellent fits to experimental observations but also has been criticized for being too flexible, post hoc and difficult to interpret. The current study introduces the early maximum likelihood estimation (MLE) model of audiovisual integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures favored more complex models. This difference between conventional error measures and cross-validation was found to be indicative of over-fitting in more complex models such as the FLMP. (C) 2015 Acoustical Society of America.
引用
收藏
页码:2884 / 2891
页数:8
相关论文
共 47 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   The ventriloquist effect results from near-optimal bimodal integration [J].
Alais, D ;
Burr, D .
CURRENT BIOLOGY, 2004, 14 (03) :257-262
[3]   Audiovisual integration of speech falters under high attention demands [J].
Alsius, A ;
Navarra, J ;
Campbell, R ;
Soto-Faraco, S .
CURRENT BIOLOGY, 2005, 15 (09) :839-843
[4]  
Andersen T. S., 2002, P 1 INT NAISO C NEUR
[5]  
Andersen T. S., 2001, INT C AUD VIS SPEECH, P172
[6]   Maximum Likelihood Integration of rapid flashes and beeps [J].
Andersen, TS ;
Tiippana, K ;
Sams, M .
NEUROSCIENCE LETTERS, 2005, 380 (1-2) :155-160
[7]   Factors influencing audiovisual fission and fusion illusions [J].
Andersen, TS ;
Tiippana, K ;
Sams, M .
COGNITIVE BRAIN RESEARCH, 2004, 21 (03) :301-308
[8]  
[Anonymous], 1998, Perceiving talking faces: From speech perception to a behavioral principle, MIT Press/Bradford Books series in cognitive psychology
[9]  
Ashby F.G., 1992, Multidimensional models of categorization. Multidimensional Models of Perception and Cognition, P449
[10]   CROSSMODAL INTEGRATION IN THE IDENTIFICATION OF CONSONANT SEGMENTS [J].
BRAIDA, LD .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 1991, 43 (03) :647-677