Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

被引:0
|
作者
Buker, Aykut [1 ]
Hanilci, Cemal [1 ]
机构
[1] Bursa Tech Univ, Dept Elect & Elect Engn, TR-16310 Bursa, Turkiye
关键词
Audio forensics; Wideband AMR codec; Double compressed AMR detection; Deep neural networks;
D O I
10.1007/s00034-024-02668-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Detecting double compressed (DC) speech signals is an important audio forensics task since it is highly related to the integrity and the authenticity of the recording. Adaptive multi-rate (AMR) speech codec is a popular audio compression technique specifically optimized for speech signals and it is a standard audio recording format in the vast majority of the smart phones. All of the previous studies addressing the detection of DC AMR signals report their findings for the speech signals compressed using the narrowband AMR codec (AMR-NB). Meanwhile, wideband AMR codec (AMR-WB) has been used by several mobile phone manufacturers, but DC AMR-WB speech signal detection performance remains unknown. To the best of our knowledge, this is the first study focusing on detecting the DC signals compressed using the AMR-WB speech codec. To this end, we propose three different deep neural network-based DC AMR-WB signal detection systems where the spectrogram representations of the speech signals are used as the input features. Experimental results conducted on TIMIT database provide several important findings regarding the DC AMR-WB speech detection. Firstly, DC AMR-WB detection is found to be a more challenging task than detecting the AMR-NB signals. For example, convolutional neural network (CNN)-based system yields 74.83% and 99.93% detection rates on AMR-WB and AMR-NB coded signals, respectively. Secondly, capturing the temporal information using long short-term memory (LSTM) network with the DC AMR-WB signal detection accuracy of 86.25% is found to be superior to the CNN system. Thirdly, combining the deep feature representations learned by CNN and LSTM networks further improves the performance. Fourthly, the detection rates are found to deteriorate when the signals are first encoded using different audio codecs prior to AMR-WB compression. Finally, applying score level or decision level fusion to the proposed three systems improves the detection rates, in general.
引用
收藏
页码:4528 / 4546
页数:19
相关论文
共 50 条
  • [41] Improved automatic speech recognition system using sparse decomposition by basis pursuit with deep rectifier neural networks and compressed sensing recomposition of speech signals
    Gavrilescu, Mihai
    2014 10TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2014,
  • [42] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
    Li, Kun
    Qian, Xiaojun
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207
  • [43] Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks
    Ogawa, Atsunori
    Hori, Takaaki
    SPEECH COMMUNICATION, 2017, 89 : 70 - 83
  • [44] Corrective Focus Detection in Italian Speech Using Neural Networks
    Lopez-Zorrilla, Asier
    deVelasco-Vazquez, Mikel
    Cenceschi, Sonia
    Ines Torres, M.
    ACTA POLYTECHNICA HUNGARICA, 2018, 15 (05) : 109 - 127
  • [45] Detection of phonological features in continuous speech using neural networks
    King, S
    Taylor, P
    COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04): : 333 - 353
  • [46] Cell mitosis detection using deep neural networks
    Zhou, Yao
    Mao, Hua
    Yi, Zhang
    KNOWLEDGE-BASED SYSTEMS, 2017, 137 : 19 - 28
  • [47] Power Theft Detection Using Deep Neural Networks
    Mangat, Gagandeep
    Divya, Divya
    Gupta, Varun
    Sambyal, Nitigya
    ELECTRIC POWER COMPONENTS AND SYSTEMS, 2021, 49 (4-5) : 458 - 473
  • [48] Object Detection Using Deep Convolutional Neural Networks
    Qian, Huimin
    Xu, Jiawei
    Zhou, Jun
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1151 - 1156
  • [49] Video Dynamics Detection Using Deep Neural Networks
    Zheng, Keji
    Yan, Wei Qi
    Nand, Parma
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (03): : 224 - 234
  • [50] Scalable Object Detection using Deep Neural Networks
    Erhan, Dumitru
    Szegedy, Christian
    Toshev, Alexander
    Anguelov, Dragomir
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2155 - 2162