Automatic Pitch Accent Detection Using Long Short-Term Memory Neural Networks

被引:2
作者
Wu, Yizhi [1 ]
Li, Sha [1 ]
Li, Hongyan [1 ]
机构
[1] Donghua Univ, Coll Informat Sci & Technol, 2999 Renmin Rd North, Shanghai, Peoples R China
来源
2019 INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING SYSTEMS (SPSS 2019) | 2019年
关键词
Pitch accent detection; LSTM; lexical and syntactic features; acoustic features;
D O I
10.1145/3364908.3365291
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prosody detection is gaining increasingly popularity in the domain of prosody research because of its significance in Text to Sound, Computer-aided pronunciation training (CAPT), etc. Pitch accent is an important part of prosody and many recognition models of both static and dynamic have been investigated for automatic labeling it. Recently, artificial neural networks, especially Recurrent Neural Networks (RNNs) have been applied in pitch accent detection. However, traditional recurrent neural networks are unable to learn and remember over long sequences due to the issue of back-propagated error decay. To solve this problem, this paper investigates the use of Long Short-Term Memory (LSTM) neural networks for automatic pitch accent detection. This paper encodes lexical and syntactic features as binary variables and uses syllable-based acoustic features including syllable duration, syllable energy, features related to the fundamental frequency. Our experimental results show that LSTM-RNNs for pitch accent detection achieves an accuracy of 89.0%, which is better than the results of using classical detection methods by about 83.2%.
引用
收藏
页码:41 / 45
页数:5
相关论文
共 21 条
[1]   Automatic prosodic event detection using acoustic, lexical, and syntactic evidence [J].
Ananthakrishnan, Sankaranarayanan ;
Narayanan, Shrikanth S. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :216-228
[2]  
Chen K, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P509
[3]  
Fach M., 1995, 4 EUR C SPEECH COMM
[4]  
Gregory M., 2004, P ACL
[5]   PITCH ACCENT IN CONTEXT - PREDICTING INTONATIONAL PROMINENCE FROM TEXT [J].
HIRSCHBERG, J .
ARTIFICIAL INTELLIGENCE, 1993, 63 (1-2) :305-340
[6]  
Hochreiter S, 1997, Neural Computation, V9, P1735
[7]  
Imoto K., 2002, 6 INT C SPOKEN LANGU, V3, P749
[8]   AUTOMATIC PROSODIC EVENTS DETECTION USING SYLLABLE-BASED ACOUSTIC AND SYNTACTIC FEATURES [J].
Jeon, Je Hun ;
Liu, Yang .
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :4565-4568
[9]   Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks [J].
Li, Kun ;
Mao, Shaoguang ;
Li, Xu ;
Wu, Zhiyong ;
Meng, Helen .
SPEECH COMMUNICATION, 2018, 96 :28-36
[10]  
Li K, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P2020