A lazy learning-based language identification from speech using MFCC-2 features

被引:0
|
作者
Himadri Mukherjee
Sk Md Obaidullah
K. C. Santosh
Santanu Phadikar
Kaushik Roy
机构
[1] West Bengal State University,Department of Computer Science
[2] Aliah University,Department of Computer Science and Engineering
[3] The University of South Dakota,Department of Computer Science
[4] Maulana Abul Kalam Azad University of Technology,Department of Computer Science and Engineering
关键词
Lazy learning; Speech recognition; Language identification; Mel frequency cepstral coefficient-based features;
D O I
暂无
中图分类号
学科分类号
摘要
Developing an automatic speech recognition system for multilingual countries like India is a challenging task due to the fact that the people are inured to using multiple languages while talking. This makes language identification from speech an important and essential task prior to recognition of the same. In this paper a system is proposed towards language identification from multilingual speech signals. A new second level Mel frequency cepstral coefficient-based feature named MFCC-2 that handles the large and uneven dimensionality of MFCC has been used to characterize languages in the thick of English, Bangla and Hindi. The system has been tested with recordings of as many as 12,000 utterances of numerals and 41,884 clips extracted from YouTube videos considering background music, data from multiple environments, avoidance of noise suppression and use of keywords from different languages in a single phrase. The highest and average accuracies (for Top-3 classifiers from a pool of nine classifiers) of 98.09% and 95.54%, respectively were achieved for YouTube data.
引用
收藏
页码:1 / 14
页数:13
相关论文
共 50 条
  • [21] Lazy learning-based online identification and adaptive PID control: A case study for CSTR process
    Pan, Tianhong
    Li, Shaoyuan
    Cai, Wen-Jian
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2007, 46 (02) : 472 - 480
  • [22] Deep Learning-Based Speech Emotion Recognition Using Multi-Level Fusion of Concurrent Features
    Kakuba, Samuel
    Poulose, Alwin
    Han, Dong Seog
    IEEE ACCESS, 2022, 10 : 125538 - 125551
  • [23] Learning-based topic detection using multiple features
    Zheng, Hai-Tao
    Wang, Zhe
    Wang, Wei
    Sangaiah, Arun Kumar
    Xiao, Xi
    Zhao, Congzhi
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (15):
  • [24] A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language
    Ullah, Shaheen
    Ahmad, Riaz
    Namoun, Abdallah
    Muhammad, Siraj
    Ullah, Khalil
    Hussain, Ibrar
    Ibrahim, Isa Ali
    IEEE ACCESS, 2024, 12 : 86355 - 86364
  • [25] Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition
    Kim, Hwamin
    Park, Jeong-Sik
    APPLIED SCIENCES-BASEL, 2020, 10 (07):
  • [26] Deep Learning-Based Emotion Recognition by Fusion of Facial Expressions and Speech Features
    Vardhan, Jasthi Vivek
    Chakravarti, Yelavarti Kalyan
    Chand, Annam Jitin
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
  • [27] Remarks on emotional speech recognition using learning-based approach
    Takahashi, K
    Nakatsu, R
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XIV, PROCEEDINGS: COMPUTER AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 304 - 309
  • [28] Machine learning-based software requirements identification for a large number of features
    Talele, Pratvina
    Phalnikar, Rashmi
    Talele, Pratvina (pratvina.talele@mitwpu.edu.in), 1600, Inderscience Publishers (06): : 255 - 260
  • [29] Identification of Hate Speech and Abusive Language on Indonesian Twitter Using theWord2vec, Part of Speech and Emoji Features
    Ibrohim, Muhammad Okky
    Setiadi, Muhammad Akbar
    Budi, Indra
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,
  • [30] A Subset of Acoustic Features for Machine Learning-based and Statistical Approaches in Speech Emotion Recognition
    Costantini, Giovanni
    Cesarini, Valerio
    Casali, Daniele
    BIOSIGNALS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 4: BIOSIGNALS, 2022, : 257 - 264