UrduSER: A comprehensive dataset for speech emotion recognition in Urdu language

被引：0

作者：

Akhtar, Muhammad Zaheer ^{[1
,2
]}

Jahangir, Rashid ^{[2
]}

Ain, QuratUl ^{[1
]}

Nauman, Muhammad Asif ^{[3
]}

Uddin, Mueen ^{[4
]}

Ullah, Syed Sajid ^{[5
]}

机构：

[1] Islamia Univ Bahawalpur, Dept Informat Technol, Bahawalpur 63100, Pakistan

[2] COMSATS Univ Islamabad, Dept Comp Sci, Vehari Campus, Islamabad 61100, Pakistan

[3] Riphah Int Univ, Riphah Sch Comp & Innovat, Lahore, Pakistan

[4] Univ Doha Sci & Technol, Coll Comp & IT, Doha 24449, Qatar

[5] Univ Agder UiA, Dept Informat & Commun Technol, N-4898 Grimstad, Norway

来源：

DATA IN BRIEF | 2025年 / 60卷

关键词：

Speech emotion recognition; Signal processing; Deep learning; Urdu language; Dataset;

D O I：

10.1016/j.dib.2025.111627

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Speech Emotion Recognition (SER) is a rapidly evolving field of research that aims to identify and categorize emotional states through speech signal analysis. As SER holds considerable socio <not sign>cultural and business significance, researchers are increasingly exploring machine learning and deep learning techniques to advance this technology. A well-suited dataset is a crucial resource for SER studies in a specific language. However, despite being the 10th most spoken language globally, Urdu lacks SER datasets, creating a significant research gap. The available Urdu SER datasets are insufficient due to their limited scope, including a narrow range of emotions, small datasets, and a limited number of dialogs, which restricts their usability in real-world scenarios. To fill the gap in existing Urdu speech datasets, an Urdu Speech Emotion Recognition Dataset (UrduSER) is developed. This comprehensive dataset consists of 3500 speech signals from 10 professional actors, with a balanced mix of males and females, and diverse age ranges. The speech signals were sourced from a vast collection of Pakistani Urdu drama serials and telefilms available on YouTube. Seven emotional states are covered in the dataset: Angry, Fear, Boredom, Disgust, Happy, Neutral, and Sad. A notable strength of this dataset is the diversity of the dialogs, with each utterance containing almost unique content, in contrast to existing datasets that often feature repetitive samples of predefined dialogs spoken by research volunteers in a laboratory environment. To ensure balance and symmetry, the dataset consists of 500 samples for each emotional class, with 50 samples per actor per emotion. An accompanying Excel file provides a detailed metadata index for each audio sample, including file name, duraand the Urdu dialogue script. This comprehensive metadata index enables researchers and developers to efficiently access, organize, and utilize the UrduSER dataset. The UrduSER dataset underwent a rigorous validation process, integrating expert validation to confirm its validity, reliability, and over (c) 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

引用

页数：14

共 17 条

[1] Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network [J].