Low Complexity Multi-directional In-Air Ultrasonic Gesture Recognition Using a TCN

被引:0
作者
Ibrahim, Emad A. [1 ]
Geilen, Marc [1 ]
Huisken, Jos [1 ]
Li, Min [2 ]
de Gyvez, Jose Pineda [1 ]
机构
[1] Eindhoven Univ Technol, Dept Elect Engn, Eindhoven, Netherlands
[2] Eindhoven Univ Technol, Dept Ind Design, Eindhoven, Netherlands
来源
PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020) | 2020年
基金
欧盟地平线“2020”;
关键词
Gesture Recognition; Temporal Convolutional Networks (TCN); Human System Interaction (HSI); Edge Devices; Doppler shift;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
On the trend of ultrasound-based gesture recognition, this study introduces the concept of time-sequence classification of ultrasonic patterns induced by hand movements on a microphone array. We refer to time-sequence ultrasound echoes as continuous frequency patterns being received in real-time at different steering angles. The ultrasound source is a single tone continuously being emitted from the center of the microphone array. In the interim, the array beamforms and locates an ultrasonic activity (induced echoes) after which a processing pipeline is initiated to extract band-limited frequency features. These beamformed features are organized in a 2D matrix of size 11 x 30 updated every 10ms on which a Temporal Convolutional Network (TCN) outputs continuous classification. Prior to that, the same TCN is trained to classify Doppler shift variability rate. Using this approach, we show that a user can easily achieve 49 gestures at different steering angles by means of sequence detection. To make it simple to users, we define two Doppler shift variability rates; very slow and very fast which the TCN detects 95-99% of the time. Not only a gesture can be performed at different directions but also the length of each performed gesture can be measured. This leverages the diversity of inair ultrasonic gestures allowing more control capabilities. The process is designed under low-resource settings; that is, given the fact that this real-time process is always-on, the power and memory resources should be optimized. The proposed solution needs 6.2 - 10.2 MMACs and a memory footprint of 6KB allowing such gesture recognition system to be hosted by energy-constrained edge devices such as smart-speakers.
引用
收藏
页码:1259 / 1264
页数:6
相关论文
共 17 条
  • [1] Ai HJ, 2016, INT C PATT RECOG, P973, DOI 10.1109/ICPR.2016.7899762
  • [2] Bai Shaojie, 2018, Universal language model fine-tuning for text classification
  • [3] Chollet F., 2015, KERAS
  • [4] Das A, 2017, INT CONF ACOUST SPEE, P406, DOI 10.1109/ICASSP.2017.7952187
  • [5] Three-dimensional Imaging Sensor System Using an Ultrasonic Array Sensor and a Camera
    Furuhashi, Hideo
    Kuzuya, Yuta
    Uchida, Yoshiyuki
    Shimizu, Masatoshi
    [J]. 2010 IEEE SENSORS, 2010, : 713 - 718
  • [6] Gupta S., 2012, P ACM SIGCHI C HUM F, P1911
  • [7] Ibrahim E. A., 2019, IEEE INT WORKSH SIGN
  • [8] Intermodulation: Improvisation and Collaborative Art Practice for HCI
    Kang, Laewoo
    Jackson, Steven J.
    Sengers, Phoebe
    [J]. PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018), 2018,
  • [9] Kim Hanung, 2018, Unpublished master's thesis]
  • [10] Kingma DP., 2017, A method for stochastic optimization, DOI DOI 10.48550/ARXIV.1412.6980