MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE

被引：0

作者：

Godambe, Tejas ^{[1
]}

Bondale, Nandini ^{[1
]}

Samudravijaya, K. ^{[1
]}

Rao, Preeti ^{[2
]}

机构：

[1] Tata Inst Fundamental Res, Sch Technol & Comp Sci, Homi Bhabha Rd, Bombay 400005, Maharashtra, India

[2] Indian Inst Technol, Dept Elect Engn, Bombay, Maharashtra, India

来源：

2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE) | 2013年

关键词：

speech recognition; speech data; Marathi; transcription;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.

引用

页数：6

共 50 条

[21] Sparse Component Analysis for Speech Recognition in Multi-Speaker Environment
Asaei, Afsaneh
Bourlard, Herve
Garner, Philip N.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1704 - 1707
[22] PHONEME DEPENDENT SPEAKER EMBEDDING AND MODEL FACTORIZATION FOR MULTI-SPEAKER SPEECH SYNTHESIS AND ADAPTATION
Fu, Ruibo
Tao, Jianhua
Wen, Zhengqi
Zheng, Yibin
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6930 - 6934
[23] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
Soleymanpour, Mohammad
Johnson, Michael T.
Soleymanpour, Rahim
Berry, Jeffrey
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
[24] Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
Song, Wei
Yue, Yanghao
Zhang, Ya-jie
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 132 - 140
[25] A unified network for multi-speaker speech recognition with multi-channel recordings
Liu, Conggui
Inoue, Nakamasa
Shinoda, Koichi
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
[26] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
Singh, Abhayjeet
Nagireddi, Amala
Jayakumar, Anjali
Deekshitha, G.
Bandekar, Jesuraja
Roopa, R.
Badiger, Sandhya
Udupa, Sathvik
Kumar, Saurabh
Ghosh, Prasanta Kumar
Murthy, Hema A.
Zen, Heiga
Kumar, Pranaw
Kant, Kamal
Bole, Amol
Singh, Bira Chandra
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
[27] DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech
Adibian, Majid
Zeinali, Hossein
Barmaki, Soroush
LANGUAGE RESOURCES AND EVALUATION, 2025,
[28] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
Luong, Hieu-Thi
Wang, Xin
Yamagishi, Junichi
Nishizawa, Nobuyuki
INTERSPEECH 2019, 2019, : 1303 - 1307
[29] Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation
Mitsui, Kentaro
Koriyama, Tomoki
Saruwatari, Hiroshi
SPEECH COMMUNICATION, 2021, 132 : 132 - 145
[30] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
Lee, Junmo
Song, Kwangsub
Noh, Kyoungjin
Park, Tae-Jun
Chang, Joon-Hyuk
2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64

← 1 2 3 4 5 →