MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE

被引:0
|
作者
Godambe, Tejas [1 ]
Bondale, Nandini [1 ]
Samudravijaya, K. [1 ]
Rao, Preeti [2 ]
机构
[1] Tata Inst Fundamental Res, Sch Technol & Comp Sci, Homi Bhabha Rd, Bombay 400005, Maharashtra, India
[2] Indian Inst Technol, Dept Elect Engn, Bombay, Maharashtra, India
关键词
speech recognition; speech data; Marathi; transcription;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Sparse Component Analysis for Speech Recognition in Multi-Speaker Environment
    Asaei, Afsaneh
    Bourlard, Herve
    Garner, Philip N.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1704 - 1707
  • [22] PHONEME DEPENDENT SPEAKER EMBEDDING AND MODEL FACTORIZATION FOR MULTI-SPEAKER SPEECH SYNTHESIS AND ADAPTATION
    Fu, Ruibo
    Tao, Jianhua
    Wen, Zhengqi
    Zheng, Yibin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6930 - 6934
  • [23] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
  • [24] Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
    Song, Wei
    Yue, Yanghao
    Zhang, Ya-jie
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 132 - 140
  • [25] A unified network for multi-speaker speech recognition with multi-channel recordings
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
  • [26] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
    Singh, Abhayjeet
    Nagireddi, Amala
    Jayakumar, Anjali
    Deekshitha, G.
    Bandekar, Jesuraja
    Roopa, R.
    Badiger, Sandhya
    Udupa, Sathvik
    Kumar, Saurabh
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Zen, Heiga
    Kumar, Pranaw
    Kant, Kamal
    Bole, Amol
    Singh, Bira Chandra
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
  • [27] DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech
    Adibian, Majid
    Zeinali, Hossein
    Barmaki, Soroush
    LANGUAGE RESOURCES AND EVALUATION, 2025,
  • [28] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
    Luong, Hieu-Thi
    Wang, Xin
    Yamagishi, Junichi
    Nishizawa, Nobuyuki
    INTERSPEECH 2019, 2019, : 1303 - 1307
  • [29] Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation
    Mitsui, Kentaro
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    SPEECH COMMUNICATION, 2021, 132 : 132 - 145
  • [30] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
    Lee, Junmo
    Song, Kwangsub
    Noh, Kyoungjin
    Park, Tae-Jun
    Chang, Joon-Hyuk
    2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64