Voice Orientation Recognition: New Paradigm of Speech-Based Human-Computer Interaction

被引:2
|
作者
Bu, Yiyu [1 ]
Guo, Peng [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Peoples R China
关键词
Voice orientation recognition; human-computer interaction; speech interaction; mouth radiation pattern; attention mechanism;
D O I
10.1080/10447318.2023.2233128
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As one of the most preferred forms of Human-Computer Interaction (HCI) nowadays, speech-based HCI enables people to communicate verbally with machines, leveraging technologies such as speech recognition and speech synthesis. Current paradigm of speech-based HCI focus on the content of speech only, failing to comprehend deeper pointing information in voice interaction. In particular, when encountering scenarios with multiple smart voice devices around, if people intend to interact with a certain device, the lack of extra pointing information (like the role played by the direction of eye gaze) would cause unintended response from the other devices, resulting in poor interaction experience during HCI. Hence, an interesting problem is: Is it possible for the devices to be aware of the orientation of human voice with only the acoustic speech signals? There is little research studying this topic, except for very a few primary works with much room for improvement. The main challenge of this study lies in capturing the concealed orientation information embedded within the speech signal, while simultaneously maintaining the scheme's practicality and high precision. In this paper, we propose Oriennet, for identifying the orientation of human voice. With a series of features intentionally designed in view of the indoor voice propagation model and mouth radiation pattern, as well as the application of attention mechanism, Oriennet achieve 95% accuracy in terms of judging whether people are facing the device or not. Even for the fine-grained task of classifying people's specific orientation from 8 different directions, our work achieved an accuracy of 74%, far outperforming the existed works. We have validated the robustness of Oriennet under various conditions (noisy environment; different people, rooms, languages, locations; fewer microphones), demonstrating its promising applicability in real-life scenarios.
引用
收藏
页码:5259 / 5278
页数:20
相关论文
共 50 条
  • [1] Study on Multichannel Speech Enhancement Technology in Voice Human-Computer Interaction
    Lu Jixiang
    Wang Ping
    Shi Hongzhong
    Wang Xin
    MANUFACTURING SYSTEMS AND INDUSTRY APPLICATIONS, 2011, 267 : 762 - 767
  • [2] Automated Speech Recognition System in Advancement of Human-Computer Interaction
    Panda, Soumya Priyadarsini
    2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 302 - 306
  • [3] Interaction and Resistance: The Recognition of Intentions in New Human-Computer Interaction
    Mueller, Vincent C.
    TOWARD AUTONOMOUS, ADAPTIVE, AND CONTEXT-AWARE MULTIMODAL INTERFACES: THEORETICAL AND PRACTICAL ISSUES, 2011, 6456 : 1 - 7
  • [4] A Review of Methods in Speech and Facial Expressions Recognition for Human-Computer Interaction
    Sirai, Ellysha Astin Anak
    Aran, Leticia Ria
    Wong, Farrah
    ADVANCED SCIENCE LETTERS, 2017, 23 (10) : 10236 - 10240
  • [5] Human-computer interaction system based on gesture recognition
    Li, Wei
    Zhang, Honglei
    Zhang, Zhilong
    Li, Chuwei
    SECOND INTERNATIONAL CONFERENCE ON OPTICS AND IMAGE PROCESSING (ICOIP 2022), 2022, 12328
  • [6] Does voice anthropomorphism affect lexical alignment in speech-based human computer dialogue?
    Cowan, Benjamin R.
    Branigan, Holly P.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 155 - 159
  • [7] Speech recognition and intelligent translation under multimodal human-computer interaction system
    Huang, Danhua
    Xiang, Shuaiqiu
    JOURNAL OF INTELLIGENT SYSTEMS, 2024, 33 (01)
  • [8] A practical paradigm and platform for video-based human-computer interaction
    Corso, Jason J.
    Ye, Guangqi
    Burschka, Darius
    Hager, Gregory D.
    COMPUTER, 2008, 41 (05) : 48 - +
  • [9] THE METHOD FOR HUMAN-COMPUTER INTERACTION BASED ON HAND GESTURE RECOGNITION
    Raudonis, Vidas
    Jonaitis, Domas
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND CONTROL TECHNOLOGIES, 2013, : 45 - 49
  • [10] Implicit Human-Computer Interaction by Posture Recognition
    Maier, Enrico
    DIGITAL HUMAN MODELING, 2011, 6777 : 143 - 150