Listen to the Speaker in Your Gaze

被引:0
作者
Yang, Hongli [1 ,2 ]
Chen, Xinyi [1 ]
Li, Junjie [1 ]
Huang, Hao [2 ]
Cai, Siqi [3 ]
Li, Haizhou [4 ]
机构
[1] Shenzhen Res Inst Big Data, Shenzhen, Peoples R China
[2] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi, Peoples R China
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen Res Inst Big Data, Shenzhen, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, CIS AND IEEE INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS, RAM, CIS-RAM 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Cocktail Party; Eye-Tracker; Target Speaker Extraction; Multi-modal; HEARING-LOSS; SPEECH;
D O I
10.1109/CIS-RAM61939.2024.10672879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attending to one's voice in a cocktail party is notably challenging, particularly for individuals with hearing impairments. This paper proposes a novel eye-controlled target speaker extraction system, which consists of an eye-tracker, face detection model, Active Speaker Detection (ASD), and Target Speaker Extraction (TSE) model. The system employs the eye-tracker to capture real-time video together with the listener's gaze. This gaze data then allows the face detection model to locate and isolate the target speaker's face within the video on a frame-by-frame basis. Using the speaker's face as the reference cue, the system can discern and separate his/her speech from a mixture of multi-talk. The experiments show that the system effectively extracts the target speaker's speech in complex auditory environments, providing both real-time performance and accuracy. A demonstration of our system is available on our website(1).
引用
收藏
页码:380 / 385
页数:6
相关论文
共 24 条
  • [1] Borsdorf M., 2023, ICASSP 2023, P1
  • [2] EEG-based Auditory Attention Detection in Cocktail Party Environment
    Cai, Siqi
    Zhu, Hongxu
    Schultz, Tanja
    Li, Haizhou
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (03)
  • [3] EEG-Based Auditory Attention Detection via Frequency and Channel Neural Attention
    Cai, Siqi
    Su, Enze
    Xie, Longhan
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2022, 52 (02) : 256 - 266
  • [4] Celniak Weronika, 2022, Information Technology in Biomedicine: 9th International Conference, ITIB 2022, Proceedings. Advances in Intelligent Systems and Computing (1429), P66, DOI 10.1007/978-3-031-09135-3_6
  • [5] Head-mounted display augmented reality in manufacturing: A systematic review
    Fang, Wei
    Chen, Lixi
    Zhang, Tienong
    Chen, Chengjun
    Teng, Zhan
    Wang, Lihui
    [J]. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2023, 83
  • [6] Effects of Sensorineural Hearing Loss on Cortical Synchronization to Competing Speech during Selective Attention
    Fuglsang, Soren A.
    Marcher-Rorsted, Jonatan
    Dau, Torsten
    Hjortkjaer, Jens
    [J]. JOURNAL OF NEUROSCIENCE, 2020, 40 (12) : 2562 - 2572
  • [7] MULTI-STAGE SPEAKER EXTRACTION WITH UTTERANCE AND FRAME-LEVEL REFERENCE SIGNALS
    Ge, Meng
    Xu, Chenglin
    Wang, Longbiao
    Chng, Eng Siong
    Dang, Jianwu
    Li, Haizhou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6109 - 6113
  • [8] Kranthikiran B., 2020, International Journal of Innovative Technology and Exploring Engineering, V9, P2908
  • [9] Face detection techniques: a review
    Kumar, Ashu
    Kaur, Amandeep
    Kumar, Munish
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (02) : 927 - 948
  • [10] Li HX, 2015, PROC CVPR IEEE, P5325, DOI 10.1109/CVPR.2015.7299170