Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

被引:0
|
作者
Buker, Aykut [1 ]
Hanilci, Cemal [1 ]
机构
[1] Bursa Tech Univ, Dept Elect & Elect Engn, TR-16310 Bursa, Turkiye
关键词
Audio forensics; Wideband AMR codec; Double compressed AMR detection; Deep neural networks;
D O I
10.1007/s00034-024-02668-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Detecting double compressed (DC) speech signals is an important audio forensics task since it is highly related to the integrity and the authenticity of the recording. Adaptive multi-rate (AMR) speech codec is a popular audio compression technique specifically optimized for speech signals and it is a standard audio recording format in the vast majority of the smart phones. All of the previous studies addressing the detection of DC AMR signals report their findings for the speech signals compressed using the narrowband AMR codec (AMR-NB). Meanwhile, wideband AMR codec (AMR-WB) has been used by several mobile phone manufacturers, but DC AMR-WB speech signal detection performance remains unknown. To the best of our knowledge, this is the first study focusing on detecting the DC signals compressed using the AMR-WB speech codec. To this end, we propose three different deep neural network-based DC AMR-WB signal detection systems where the spectrogram representations of the speech signals are used as the input features. Experimental results conducted on TIMIT database provide several important findings regarding the DC AMR-WB speech detection. Firstly, DC AMR-WB detection is found to be a more challenging task than detecting the AMR-NB signals. For example, convolutional neural network (CNN)-based system yields 74.83% and 99.93% detection rates on AMR-WB and AMR-NB coded signals, respectively. Secondly, capturing the temporal information using long short-term memory (LSTM) network with the DC AMR-WB signal detection accuracy of 86.25% is found to be superior to the CNN system. Thirdly, combining the deep feature representations learned by CNN and LSTM networks further improves the performance. Fourthly, the detection rates are found to deteriorate when the signals are first encoded using different audio codecs prior to AMR-WB compression. Finally, applying score level or decision level fusion to the proposed three systems improves the detection rates, in general.
引用
收藏
页码:4528 / 4546
页数:19
相关论文
共 50 条
  • [21] SPEECH ENHANCEMENT USING MULTIPLE DEEP NEURAL NETWORKS
    Karjol, Pavan
    Kumar, Ajay M.
    Ghosh, Prasanta Kumar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5049 - 5053
  • [22] Encoding Detection and Bit Rate Classification of AMR-Coded Speech Based on Deep Neural Network
    Shin, Seong-Hyeon
    Jang, Woo-Jin
    Yun, Ho-Won
    Park, Hochong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (01) : 269 - 272
  • [23] Stress detection using deep neural networks
    Li, Russell
    Liu, Zhandong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 11)
  • [24] Object Detection Using Deep Neural Networks
    Shah, Malay
    Kapdi, Rupal
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 787 - 790
  • [25] Stress detection using deep neural networks
    Russell Li
    Zhandong Liu
    BMC Medical Informatics and Decision Making, 20
  • [26] Cough Detection Using Deep Neural Networks
    Liu, Jia-Ming
    You, Mingyu
    Wang, Zheng
    Li, Guo-Zheng
    Xu, Xianghuai
    Qiu, Zhongmin
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [27] Monkeypox detection using deep neural networks
    Sorayaie Azar, Amir
    Naemi, Amin
    Babaei Rikan, Samin
    Mohasefi, Jamshid Bagherzadeh
    Pirnejad, Habibollah
    Wiil, Uffe Kock
    BMC INFECTIOUS DISEASES, 2023, 23 (01)
  • [28] Monkeypox detection using deep neural networks
    Amir Sorayaie Azar
    Amin Naemi
    Samin Babaei Rikan
    Jamshid Bagherzadeh Mohasefi
    Habibollah Pirnejad
    Uffe Kock Wiil
    BMC Infectious Diseases, 23
  • [29] Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection
    Wang, Weiqing
    Wu, Haiwei
    Li, Ming
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1323 - 1327
  • [30] Exploiting deep neural networks for detection-based speech recognition
    Siniscalchi, Sabato Marco
    Yu, Dong
    Deng, Li
    Lee, Chin-Hui
    NEUROCOMPUTING, 2013, 106 : 148 - 157