Steps Towards More Natural Human-Machine Interaction via Audio-Visual Word Prominence Detection

被引：0

作者：

Heckmann, Martin ^{[1
]}

机构：

[1] Honda Res Inst Europe GmbH, D-63073 Offenbach, Germany

来源：

MULTIMODAL ANALYSES ENABLING ARTIFICIAL AGENTS IN HUMAN-MACHINE INTERACTION | 2015年 / 8757卷

关键词：

Audio-visual; Prominence; Contour; FPCA; Online;

D O I：

10.1007/978-3-319-15557-9_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate how word prominence can be detected from the acoustic signal and movements of the speaker's head and mouth. Our research is based on a corpus with 12 English speakers which contains in addition to the speech signal also videos of the talker's head. To extract the word prominence information we use on one hand functionals calculated on the features and on the other hand Functional PCA (FPCA) to extract information from the contours. Combining the functionals and the contour information we obtain a discrimination accuracy between prominent and non-prominent words of 81 %. We show in particular that the visual channel is very informative for some speakers. Furthermore, we also introduce a system which extracts the prominence information online while a user is interacting with the system. The online system only uses acoustic information.

引用

页码：15 / 24

页数：10

共 14 条

[1] Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario
Heckmann, Martin
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2387 - 2390
[2] Audio-visual word prominence detection from clean and noisy speech
Heckmann, Martin
COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 15 - 30
[3] Feature-level Decision Fusion for Audio-visual Word Prominence Detection
Heckmann, Martin
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 575 - 579
[4] Evaluation of Optical Flow Field Features for the Detection of Word Prominence in a Human-Machine Interaction Scenario
Schnall, Andrea
Heckmann, Martin
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[5] Visual interaction in natural human-machine dialogue
Machrouh, Joseph
Panaget, Franck
PERCEPTION AND INTERACTIVE TECHNOLOGIES, PROCEEDINGS, 2006, 4021 : 152 - +
[6] Towards Visual Behavior Detection in Human-Machine Conversations
Roesler, Oliver
Suendermann-Oeft, David
2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 36 - 39
[7] Audio-visual intent-to-speak detection for human-computer interaction
de Cuetos, P
Neti, C
Senior, AW
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2373 - 2376
[8] Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments
Chau, Aaron
Sekiguchi, Kouhei
Nugraha, Aditya Arie
Yoshii, Kazuyoshi
Funakoshi, Kotaro
2019 28TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2019,
[9] HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish
Fernandez-Martinez, Fernando
Manuel Lucas-Cuesta, Juan
Barra Chicote, Roberto
Ferreiros, Javier
Macias-Guarasa, Javier
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2974 - 2980
[10] Unobtrusive Tremor Detection and Measurement via Human-Machine Interaction
Guettler, J.
Shah, R.
Georgoulas, C.
Bock, T.
6TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2015)/THE 5TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2015), 2015, 63 : 467 - 474

← 1 2 →