Corpus building for data-driven TTS systems

被引:8
|
作者
Zhu, WB [1 ]
Zhang, W [1 ]
Shi, Q [1 ]
Chen, FX [1 ]
Li, HP [1 ]
Ma, XJ [1 ]
Shen, LQ [1 ]
机构
[1] IBM China Res Lab, Beijing 100085, Peoples R China
关键词
D O I
10.1109/WSS.2002.1224408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To generate a data-driven TTS system of Mandarin, we built a large and balanced Mandarin text-and-speech corpus, named IBM Mandarin TTS Corpus. The corpus is designed for both statistical prosody modeling, and context dependence of phonemic features. In the script-design stage, we investigated the problem of a proper synthetic unit. Based on the appropriate choice of synthetic unit, we developed a numerical criterion for the coverage and balance of variants of the synthetic units. In the speech-recording stage, we paid attention to speaking style, which is essential to generate an effective concatenative speech synthesis system. We formulated a specification of speaking style, and guided the speaker to strictly follow the guidelines. Corpus processing is another important step. In that step, we carefully executed pronunciation marking, segment aligning, and prosodic events labeling, etc. We defined a set of prosodic hierarchical layers, to describe various prosodic events. Because those issues often involve manual effort, the quality of the processed corpus depends on both proper specifications for each step, and the training of the operating team.
引用
收藏
页码:199 / 202
页数:4
相关论文
共 50 条
  • [1] A Data-driven Meta-data Inference Framework for Building Automation Systems
    Gao, Jingkun
    Ploennigs, Joern
    Berges, Mario
    BUILDSYS'15 PROCEEDINGS OF THE 2ND ACM INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS FOR ENERGY-EFFICIENT BUILT, 2015, : 23 - 32
  • [2] A review of data-driven fault detection and diagnostics for building HVAC systems
    Chen, Zhelun
    O'Neill, Zheng
    Wen, Jin
    Pradhan, Ojas
    Yang, Tao
    Lu, Xing
    Lin, Guanjing
    Miyata, Shohei
    Lee, Seungjae
    Shen, Chou
    Chiosa, Roberto
    Piscitelli, Marco Savino
    Capozzoli, Alfonso
    Hengel, Franz
    Kuehrer, Alexander
    Pritoni, Marco
    Liu, Wei
    Clauss, John
    Chen, Yimin
    Herr, Terry
    APPLIED ENERGY, 2023, 339
  • [3] Building a data-driven teaching platform for ESL vocabulary corpus in universities based on VR technology
    Cheng L.
    International Journal for Simulation and Multidisciplinary Design Optimization, 2024, 15
  • [4] Unified architecture for data-driven metadata tagging of building automation systems
    Mishra, Sakshi
    Glaws, Andrew
    Cutler, Dylan
    Frank, Stephen
    Azam, Muhammad
    Mohammadi, Farzam
    Venne, Jean-Simon
    AUTOMATION IN CONSTRUCTION, 2020, 120
  • [5] Data-Driven Operation of Building Systems: Present Challenges and Future Prospects
    Berges, Mario
    Lange, Henning
    Gao, Jingkun
    ADVANCED COMPUTING STRATEGIES FOR ENGINEERING, PT II, 2018, 10864 : 23 - 52
  • [6] Sensing and Data-Driven Control for Smart Building and Smart City Systems
    Stamatescu, Grigore
    Fagarasan, Ioana
    Sachenko, Anatoly
    JOURNAL OF SENSORS, 2019, 2019
  • [7] DATA-DRIVEN TEST SYSTEMS
    LANDIS, AS
    HEWLETT-PACKARD JOURNAL, 1994, 45 (04): : 62 - 66
  • [8] Advanced data analytics for enhancing building performances: From data-driven to big data-driven approaches
    Cheng Fan
    Da Yan
    Fu Xiao
    Ao Li
    Jingjing An
    Xuyuan Kang
    Building Simulation, 2021, 14 : 3 - 24
  • [9] Advanced data analytics for enhancing building performances: From data-driven to big data-driven approaches
    Fan, Cheng
    Yan, Da
    Xiao, Fu
    Li, Ao
    An, Jingjing
    Kang, Xuyuan
    BUILDING SIMULATION, 2021, 14 (01) : 3 - 24
  • [10] Data-driven predictive control method for building heating systems : experimental validation
    Abdellatif, Makram
    Chamoin, Julien
    Defer, Didier
    2022 INTERNATIONAL CONFERENCE ON SMART ENERGY SYSTEMS AND TECHNOLOGIES, SEST, 2022,