EchoWhisper: Exploring an Acoustic-based Silent Speech Interface for Smartphone Users

被引:43
作者
Gao, Yang [1 ]
Jin, Yincheng [1 ]
Li, Jiyang [1 ]
Choi, Seokmin [1 ]
Jin, Zhanpeng [1 ]
机构
[1] Univ Buffalo, State Univ New York, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
来源
PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT | 2020年 / 4卷 / 03期
关键词
Acoustic; echo; silent speech; smartphone;
D O I
10.1145/3411830
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid growth of artificial intelligence and mobile computing, intelligent speech interface has recently become one of the prevalent trends and has already presented huge potentials to the public. To address the privacy leakage issue during the speech interaction or accommodate some special demands, silent speech interfaces have been proposed to enable people's communication without vocalizing their sound (e.g., lip reading, tongue tracking). However, most existing silent speech mechanisms require either background illuminations or additional wearable devices. In this study, we propose the EchoWhisper as a novel user-friendly, smartphone-based silent speech interface. The proposed technique takes advantage of the micro-Doppler effect of the acoustic wave resulting from mouth and tongue movements and assesses the acoustic features of beamformed reflected echoes captured by the dual microphones in the smartphone. Using human subjects who perform a daily conversation task with over 45 different words, our system can achieve a WER (word error rate) of 8.33%, which shows the effectiveness of inferring silent speech content. Moreover, EchoWhisper has also demonstrated its reliability and robustness to a variety of configuration settings and environmental factors, such as smartphone orientations and distances, ambient noises, body motions, and so on.
引用
收藏
页数:27
相关论文
共 61 条
[1]  
Akbari H, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P2516, DOI 10.1109/ICASSP.2018.8461856
[2]  
Al-Shoshan A.I., 2006, J KING SAUD U ENG SC, V19, P95, DOI 10.1016/S1018-3639(18)30850-X
[3]  
[Anonymous], 2019, ImageNet
[4]  
Balanis C. A., 2016, ANTENNA THEORY ANAL
[5]   GestEar: Combining Audio and Motion Sensing for Gesture Recognition on Smartwatches [J].
Becker, Vincent ;
Fessler, Linus ;
Soros, Gabor .
ISWC'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2019, :10-19
[6]   The statistical modeling of road traffic noise in an urban setting [J].
Calixto, A ;
Diniz, FB ;
Zannin, PHT .
CITIES, 2003, 20 (01) :23-29
[7]   Micro-doppler effect in radar: Phenomenon, model, and simulation study [J].
Chen, VC ;
Li, FY ;
Ho, SS ;
Wechsler, H .
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2006, 42 (01) :2-21
[8]   Acoustic noise and echo canceling with microphone array [J].
Dahl, M ;
Claesson, I .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 1999, 48 (05) :1518-1526
[9]  
Denby Bruce, 2011, PRACTICAL SILENT SPE
[10]  
Deng YB, 2014, INTERSPEECH, P1164