Voice Command Recognition for Drone Control by Deep Neural Networks on Embedded System

被引:1
作者
Yapicioglu, Cengizhan [1 ]
Dokur, Zumray [1 ]
Olmez, Tamer [1 ]
机构
[1] Istanbul Tech Univ, Dept Elect & Commun Engn, Istanbul, Turkey
来源
2021 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2021) | 2021年
关键词
speech recognition; spectrogram; convolutional neural networks; deep learning; speech processing; image processing; embedded systems;
D O I
10.1109/ICEEE52452.2021.9415964
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech recognition and its applications for controlling a system has been an important and attractive issue over the last few decades. Controlling electronic devices by speech commands allows us to manage systems quickly and easily since users would not need any additional information or remote controller. Being able to communicate a system by using speech commands also brings with the requirements of fast and accurate response. So, at the present, speech recognition algorithms are extensively performing on high performance computers. However, the improvements of system on a chip (SoC) board and deep neural network based algorithms, make it possible to execute such kind of programs on them. The proposed study defines a model for controlling a drone system by using Turkish speech directional commands in real time which is based on deep learning approaches using spectrogram images. At first, speech commands are detected in real time with the help of signal energy and zero crossing rate and these are converted to log spectrogram images. A CNN (three convolutional layers and a fully connected layer) structure is created and trained by feeding those images. Then, the trained model is moved to embedded board to achieve real time, low-cost performance. Speech commands are provided by the user instantaneously and transferred to the model as an input for decision. Then, algorithm decides which directional command is given by the user and desired operation is performed on the drone system. It is observed that, by using the proposed model, accuracies of 95.72% for offline dataset and 92,88% for real time classification are obtained.
引用
收藏
页码:65 / 72
页数:8
相关论文
共 27 条
  • [1] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [2] [Anonymous], P 3 INT C LEARNING R
  • [3] Aras S, 2014, SIG PROCESS COMMUN, P2245, DOI 10.1109/SIU.2014.6830712
  • [4] Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
  • [5] AUTOMATIC RECOGNITION OF SPOKEN DIGITS
    DAVIS, KH
    BIDDULPH, R
    BALASHEK, S
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1952, 24 (06) : 637 - 642
  • [6] Fayjie AR, 2017, INT CONF UBIQ FUTUR, P119
  • [7] Gokay R, 2019, INT MULTICONF SYST, P357, DOI [10.1109/ssd.2019.8893184, 10.1109/SSD.2019.8893184]
  • [8] Huang JT, 2015, INT CONF ACOUST SPEE, P4989, DOI 10.1109/ICASSP.2015.7178920
  • [9] Khamparia A., 2018, SPECIAL SECTION NEW, P7717
  • [10] Konusma Tanima icin Yapay Ogrenme, 2018, SAR