Infant cry classification by MFCC feature extraction with MLP and CNN structures

被引:15
作者
Abbaskhah, Ahmad [1 ,4 ]
Sedighi, Hamed [2 ,3 ,5 ]
Marvi, Hossein [4 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Sharif, Iran
[2] Beijing Inst Technol, Sch Aerosp & Engn, Beijing, Peoples R China
[3] Shahrood Univ Technol, Fac Mech Engn, Shahrood, Iran
[4] Shahrood Univ Technol, Fac Elect Engn, Shahrood, Iran
[5] Shahrood Univ Technol, Fac Mech Engn, Shahrood 3619995161, Iran
关键词
Infant cry; Mel-frequency Cepstral Coefficient; Multilayer perceptron; Support vector machine; Convolutional neural network; SMOTE; Classification; IDENTIFICATION;
D O I
10.1016/j.bspc.2023.105261
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this study, Dunstan's infant cry data set is pre-processed with the feature vector approach, including MFCC (19 features) and energy (one feature). By using extracted features and Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN) classifiers, five classes of infant cry ("Neh" = hungry; "Eh" = need to burp; "Owh" = tired; "Eairh" = stomach cramp; "Heh" = physical discomfort) are distinguished. The proposed MLP and CNN structures are analyzed according to the loss and the accuracy based on the epoch; moreover, to evaluate the performance of classifiers AUC-ROC, Confusion matrix, accuracy, f1_score, recall, and precision have been used. All three classifiers are analyzed, and their results show that the CNN-designed model has the best performance. Results show that the performance will improve by increasing the complexity of the model. With this approach, classifiers are run 10 times, and the average accuracy for SVM for SMOTE and non-SMOTE data are obtained with tolerance 0.823 +/- 0.02, 0.861 +/- 0.02, respectively. These accuracies for MLP are 0.876 +/- 0.01, 0.892 +/- 0.01, and finally, for CNN, are 0.921 +/- 0.005, 0.911 +/- 0.005. At the best condition, an accuracy of 92.1 % is obtained for five classes of infant cries by the proposed CNN structure.
引用
收藏
页数:11
相关论文
共 32 条
  • [1] Cry-based infant pathology classification using GMMs
    Alaie, Hesam Farsaie
    Abou-Abbas, Lina
    Tadj, Chakib
    [J]. SPEECH COMMUNICATION, 2016, 77 : 28 - 52
  • [2] Automatic classification of infant vocalization sequences with convolutional neural networks
    Anders, Franz
    Hlawitschka, Mario
    Fuchs, Mirco
    [J]. SPEECH COMMUNICATION, 2020, 119 : 36 - 45
  • [3] Deep Learning Assisted Neonatal Cry Classification via Support Vector Machine Models
    Ashwini, K.
    Vincent, P. M. Durai Raj
    Srinivasan, Kathiravan
    Chang, Chuan-Yu
    [J]. FRONTIERS IN PUBLIC HEALTH, 2021, 9
  • [4] Bano S, 2015, PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORKS SECURITY (ICSNS 2015)
  • [5] Bhagatpatil M. V., 2014, Int. J. Sci. Eng. Res., V5, P1379
  • [6] Cano-Ortiz D.E.-B. SD, 1999, Clasificacion de Unidades de Llanto Infantil Mediante el Mapa Auto-Organizado de Koheen, I Taller AIRENE Sobre Reconoc. Patrones Con Redes Neuronales, P24
  • [7] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [8] Spectral analysis of infant cries and adult speech
    Chittora, Anshu
    Patil, Hemant A.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 841 - 856
  • [9] Dunstan P., 2012, Calm the Crying: The Secret Baby Language That Reveals the Hidden Meaning Behind an Infants Cry
  • [10] Franti E, 2018, 2018 41ST INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), P424, DOI 10.1109/TSP.2018.8441412