Listen to the Speaker in Your Gaze

被引：0

作者：

Yang, Hongli ^{[1
,2
]}

Chen, Xinyi ^{[1
]}

Li, Junjie ^{[1
]}

Huang, Hao ^{[2
]}

Cai, Siqi ^{[3
]}

Li, Haizhou ^{[4
]}

机构：

[1] Shenzhen Res Inst Big Data, Shenzhen, Peoples R China

[2] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi, Peoples R China

[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen Res Inst Big Data, Shenzhen, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, CIS AND IEEE INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS, RAM, CIS-RAM 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Cocktail Party; Eye-Tracker; Target Speaker Extraction; Multi-modal; HEARING-LOSS; SPEECH;

D O I：

10.1109/CIS-RAM61939.2024.10672879

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attending to one's voice in a cocktail party is notably challenging, particularly for individuals with hearing impairments. This paper proposes a novel eye-controlled target speaker extraction system, which consists of an eye-tracker, face detection model, Active Speaker Detection (ASD), and Target Speaker Extraction (TSE) model. The system employs the eye-tracker to capture real-time video together with the listener's gaze. This gaze data then allows the face detection model to locate and isolate the target speaker's face within the video on a frame-by-frame basis. Using the speaker's face as the reference cue, the system can discern and separate his/her speech from a mixture of multi-talk. The experiments show that the system effectively extracts the target speaker's speech in complex auditory environments, providing both real-time performance and accuracy. A demonstration of our system is available on our website(1).

引用

页码：380 / 385

页数：6

共 24 条

[1] Borsdorf M., 2023, ICASSP 2023, P1
[2] EEG-based Auditory Attention Detection in Cocktail Party Environment
Cai, Siqi
Zhu, Hongxu
Schultz, Tanja
Li, Haizhou
[J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (03)
[3] EEG-Based Auditory Attention Detection via Frequency and Channel Neural Attention
Cai, Siqi
Su, Enze
Xie, Longhan
Li, Haizhou
[J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2022, 52 (02) : 256 - 266
[4] Celniak Weronika, 2022, Information Technology in Biomedicine: 9th International Conference, ITIB 2022, Proceedings. Advances in Intelligent Systems and Computing (1429), P66, DOI 10.1007/978-3-031-09135-3_6
[5] Head-mounted display augmented reality in manufacturing: A systematic review
Fang, Wei
Chen, Lixi
Zhang, Tienong
Chen, Chengjun
Teng, Zhan
Wang, Lihui
[J]. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2023, 83
[6] Effects of Sensorineural Hearing Loss on Cortical Synchronization to Competing Speech during Selective Attention
Fuglsang, Soren A.
Marcher-Rorsted, Jonatan
Dau, Torsten
Hjortkjaer, Jens
[J]. JOURNAL OF NEUROSCIENCE, 2020, 40 (12) : 2562 - 2572
[7] MULTI-STAGE SPEAKER EXTRACTION WITH UTTERANCE AND FRAME-LEVEL REFERENCE SIGNALS
Ge, Meng
Xu, Chenglin
Wang, Longbiao
Chng, Eng Siong
Dang, Jianwu
Li, Haizhou
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6109 - 6113
[8] Kranthikiran B., 2020, International Journal of Innovative Technology and Exploring Engineering, V9, P2908
[9] Face detection techniques: a review
Kumar, Ashu
Kaur, Amandeep
Kumar, Munish
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (02) : 927 - 948
[10] Li HX, 2015, PROC CVPR IEEE, P5325, DOI 10.1109/CVPR.2015.7299170

← 1 2 3 →