COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS

被引:4
|
作者
Escobar-Grisales, D. [1 ]
Rios-Urrego, C. D. [1 ]
Lopez-Santander, D. A. [1 ]
Gallo-Aristizabal, J. D. [1 ]
Vasquez-Correa, J. C. [1 ,2 ,3 ]
Noeth, E. [2 ]
Orozco-Arroyave, J. R. [1 ,2 ]
机构
[1] Univ Antioquia UdeA, Fac Engn, GITA Lab, Medellin, Colombia
[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Nurnberg, Germany
[3] Pratech Grp, Medellin, Colombia
来源
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年
关键词
Dialect classification; Speech; Text; Customer Service; Acoustics; Language processing; LANGUAGE;
D O I
10.1109/ASRU51503.2021.9687890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialect recognition is useful in many industrial sectors, particularly with the aim of allowing a better interaction between customers and providers. The core idea is to improve or customize marketing and customer service strategies, depending on the geographic location, birthplace and culture. This study proposes different models to automatically discriminate between two Colombian dialects: "Antioquefio" and "Bogotano", to the best of our knowledge this is the first work of Colombian dialect recognition based on real conversations from customer service centers. The proposed strategy consists of independent analyses, using information from speech recordings and their corresponding transliterations. On the one hand, classical approaches are used to model speech including prosody features, Mel frequency cepstral coefficients and the mean Hilbert envelope coefficients. For text models, Word2Vec and bidirectional encoding representations from transformer embeddings are considered. On the other hand, a deep learning approach is applied by considering convolutional neural networks, which are trained using spectrograms and embedding matrices for speech and text, respectively. The implemented deep learning models seem to be more promising than the classical ones for the addressed problem. Further experiments will be considered to validate this claim in a wider spectrum of methods.
引用
收藏
页码:556 / 563
页数:8
相关论文
共 23 条
  • [1] Colombian Dialect Recognition from Call-Center Conversations Using Fusion Strategies
    Escobar-Grisales, D.
    Rios-Urrego, C. D.
    Gallo-Aristizabal, J. D.
    Lopez-Santander, D. A.
    Calvo-Ariza, N. R.
    Noth, Elmar
    Orozco-Arroyave, J. R.
    APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2022, 2022, 1685 : 54 - 65
  • [2] Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
    Koji Iwano
    Tomoaki Yoshinaga
    Satoshi Tamura
    Sadaoki Furui
    EURASIP Journal on Audio, Speech, and Music Processing, 2007
  • [3] Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
    Iwano, Koji
    Yoshinaga, Tomoaki
    Tamura, Satoshi
    Furui, Sadaoki
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [4] Study the Influence of Gender and Age in Recognition of Emotions from Algerian Dialect Speech
    Houari, Horkous
    Guerti, Mhania
    TRAITEMENT DU SIGNAL, 2020, 37 (03) : 413 - 423
  • [5] The impact of soft information extracted from descriptive text on crowdfunding performance
    Jiang, Cuixia
    Han, Ranran
    Xu, Qifa
    Liu, Yezheng
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2020, 43
  • [6] Automatic Speech Recognition from Neural Signals: A Focused Review
    Herff, Christian
    Schultz, Tanja
    FRONTIERS IN NEUROSCIENCE, 2016, 10
  • [7] Information Retrieval and Recommendation using Emotion from Speech Signals
    Iliev, Alexander
    Stanchev, Peter L.
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 222 - 225
  • [8] Emotion Recognition Based on EMD-Wavelet Analysis of Speech Signals
    Shahnaz, C.
    Sultanas, S.
    Fattah, S. A.
    Rafi, R. H. M.
    Ahmmed, I.
    Zhu, W. -P.
    Ahmad, M. O.
    2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 307 - 310
  • [9] Context-Independent Multilingual Emotion Recognition from Speech Signals
    Vladimir Hozjan
    Zdravko Kačič
    International Journal of Speech Technology, 2003, 6 (3) : 311 - 320
  • [10] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
    Tang, Jian
    Hou, Junfeng
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    IEEE ACCESS, 2020, 8 (08): : 108988 - 108999