A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

被引:1
|
作者
Paul, Bachchu [1 ]
Phadikar, Santanu [2 ]
机构
[1] Vidyasagar Univ, Dept Comp Sci, Midnapore 721102, W Bengal, India
[2] Maulana Abul Kalam Azad Univ Technol, Dept Comp Sci & Engn, BF-142,Sect 1, Kolkata 700064, W Bengal, India
关键词
Phoneme; Diphthong; Lagrange interpolation; Mel frequency cepstral coefficient; Support vector machine; Deep neural network; SPEECH; RECOGNITION; CORPUS;
D O I
10.1007/s11042-022-13594-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In linguistics, phonemes are the atomic sound, called word segmentor play an important role to recognize the word properly. A novel approach of seven Bengali vowels and ten diphthongs (a syllable for the pronunciation of two consecutive vowels) phoneme recognition has been proposed in the paper. In the proposed method, before extracting the feature, a novel pre-processing technique using amplitude interpolation method has been developed to align the starting point of all the phonemes of the same class which in turn boosts the recognition rate. Here seven Bengali vowels and ten diphthongs audio clips uttered by twenty persons (ten times each) of different age group and sex have been recorded to create a data set of 3400 audio samples for the proposed experiment. For each class of phonemes and diphthongs one sample (selected by linguistic) have been considered as a benchmark. Then each of the recorded audio clips is interpolated to match with the benchmark clip of the corresponding phoneme by finding the valleys in the amplitude using Lagrange interpolation technique. After that, 19 MFCC (Mel Frequency Cepstral Co-Efficient) speech features have been extracted from each phoneme of the interpolated audio clips and feed to classify using Support Vector Machine (SVM), k- Nearest Neighbour (KNN) and Deep Neural Network (DNN) classifier and the average classification accuracy obtained for vowels and diphthongs are 94.93% and 94.56% respectively. To check the effectiveness of the proposed pre-processing technique same MFCC features have been extracted from the raw recorded phonemes and feed to same classifiers and average accuracy obtained for vowels and diphthongs are 89.21% and 88.56% respectively which shows the effectiveness of the proposed method. It is also to note that best accuracy obtained using the DNN classifier with the accuracy of 98.16% for vowels and 97% for diphthongs.
引用
收藏
页码:7735 / 7755
页数:21
相关论文
共 50 条
  • [1] A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes
    Bachchu Paul
    Santanu Phadikar
    Multimedia Tools and Applications, 2023, 82 : 7735 - 7755
  • [2] A novel contrast enhancement technique for diabetic retinal image pre-processing and classification
    Naz, Huma
    Ahuja, Neelu Jyothi
    INTERNATIONAL OPHTHALMOLOGY, 2024, 45 (01)
  • [3] Impact of applying pre-processing techniques for improving classification accuracy
    T. Sree Sharmila
    K. Ramar
    T. Sree Renga Raja
    Signal, Image and Video Processing, 2014, 8 : 149 - 157
  • [4] Impact of applying pre-processing techniques for improving classification accuracy
    Sharmila, T. Sree
    Ramar, K.
    Raja, T. Sree Renga
    SIGNAL IMAGE AND VIDEO PROCESSING, 2014, 8 (01) : 149 - 157
  • [5] A Simplistic and Novel Technique for ECG Signal Pre-Processing
    Gupta, Varun
    Mittal, Monika
    Mittal, Vikas
    IETE JOURNAL OF RESEARCH, 2024, 70 (01) : 815 - 826
  • [6] Image Pre-processing on NumtaDB for Bengali Handwritten Digit Recognition
    Paul, Ovi
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [7] A novel amalgamation of pre-processing technique and CNN model for accurate classification of power quality disturbances
    Soni, Prity
    Mishra, Pankaj
    Mondal, Debasmita
    ELECTRICAL ENGINEERING, 2024, : 5187 - 5206
  • [8] Pre-processing of Photoplethysmographic Waveform for Amplitude Regularization
    Kim, Jaepil
    Lee, Jeong-Whan
    Shin, Hangsik
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2019, 14 (04) : 1741 - 1748
  • [9] Pre-processing of Photoplethysmographic Waveform for Amplitude Regularization
    Jaepil Kim
    Jeong-Whan Lee
    Hangsik Shin
    Journal of Electrical Engineering & Technology, 2019, 14 : 1741 - 1748
  • [10] Application of Granular Computing-Based Pre-processing in the Labelling of Phonemes
    Ashrafi, Negin
    Ramanna, Sheela
    INTELLIGENT DECISION TECHNOLOGIES, KES-IDT 2021, 2021, 238 : 141 - 150