Steps Towards More Natural Human-Machine Interaction via Audio-Visual Word Prominence Detection

被引:0
|
作者
Heckmann, Martin [1 ]
机构
[1] Honda Res Inst Europe GmbH, D-63073 Offenbach, Germany
关键词
Audio-visual; Prominence; Contour; FPCA; Online;
D O I
10.1007/978-3-319-15557-9_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate how word prominence can be detected from the acoustic signal and movements of the speaker's head and mouth. Our research is based on a corpus with 12 English speakers which contains in addition to the speech signal also videos of the talker's head. To extract the word prominence information we use on one hand functionals calculated on the features and on the other hand Functional PCA (FPCA) to extract information from the contours. Combining the functionals and the contour information we obtain a discrimination accuracy between prominent and non-prominent words of 81 %. We show in particular that the visual channel is very informative for some speakers. Furthermore, we also introduce a system which extracts the prominence information online while a user is interacting with the system. The online system only uses acoustic information.
引用
收藏
页码:15 / 24
页数:10
相关论文
共 14 条
  • [1] Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario
    Heckmann, Martin
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2387 - 2390
  • [2] Audio-visual word prominence detection from clean and noisy speech
    Heckmann, Martin
    COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 15 - 30
  • [3] Feature-level Decision Fusion for Audio-visual Word Prominence Detection
    Heckmann, Martin
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 575 - 579
  • [4] Evaluation of Optical Flow Field Features for the Detection of Word Prominence in a Human-Machine Interaction Scenario
    Schnall, Andrea
    Heckmann, Martin
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [5] Visual interaction in natural human-machine dialogue
    Machrouh, Joseph
    Panaget, Franck
    PERCEPTION AND INTERACTIVE TECHNOLOGIES, PROCEEDINGS, 2006, 4021 : 152 - +
  • [6] Towards Visual Behavior Detection in Human-Machine Conversations
    Roesler, Oliver
    Suendermann-Oeft, David
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 36 - 39
  • [7] Audio-visual intent-to-speak detection for human-computer interaction
    de Cuetos, P
    Neti, C
    Senior, AW
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2373 - 2376
  • [8] Audio-Visual SLAM towards Human Tracking and Human-Robot Interaction in Indoor Environments
    Chau, Aaron
    Sekiguchi, Kouhei
    Nugraha, Aditya Arie
    Yoshii, Kazuyoshi
    Funakoshi, Kotaro
    2019 28TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2019,
  • [9] HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish
    Fernandez-Martinez, Fernando
    Manuel Lucas-Cuesta, Juan
    Barra Chicote, Roberto
    Ferreiros, Javier
    Macias-Guarasa, Javier
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2974 - 2980
  • [10] Unobtrusive Tremor Detection and Measurement via Human-Machine Interaction
    Guettler, J.
    Shah, R.
    Georgoulas, C.
    Bock, T.
    6TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2015)/THE 5TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2015), 2015, 63 : 467 - 474