Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model, and novel feature vectors for vocal activity detection

被引：7

作者：

Fujihara, Hiromasa

Goto, Masataka

机构：

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年

关键词：

music; lyrics; fricative sounds; filler model; spectral representation;

D O I：

10.1109/ICASSP.2008.4517548

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Three techniques are described that improve a previously developed system for automatically synchronizing lyrics with musical audio signals. Although this system achieves state-of-the-art accuracy by extracting vocal vowels from polyphonic sound mixtures and using forced alignment between those vowels and a phoneme network of the lyrics, there was still room for improvement. The first technique detects nonexistence regions in which fricative consonant sounds do not exist, which were not utilized in the previous system, and prohibits the alignment of the fricative phonemes to those regions. The second technique inserts a filler model between phrases of the phoneme network. This model improves the accuracy of the forced alignment by ignoring inter-phrase vowel utterances not included in the lyrics. The third technique introduces novel feature vectors for vocal activity detection that enable a distance calculation between two sets of the harmonic structure without estimating their spectral envelopes. Experimental results showed that all three techniques contribute to improved synchronization.

引用

页码：69 / 72

页数：4

共 9 条

[1]

Fujihara H, 2006, IEEE INT SYM MULTIM, P257

[2] A real-time music-scene-description system: predominant-FO estimation for detecting melody and bass lines in real-world audio signals [J].

Goto, M .

SPEECH COMMUNICATION, 2004, 43 (04) :311-329

[3]

Goto M., 2002, Ismir, P287

[4]

Gruhne M., 2007, ISMIR, P369

[5]

KAMEOKA H, 2006, 2006MUS6613 IPSJ SIG, P77

[6]

LOSCOS A, 1999, P ICMC 1999

[7]

Wang C. K., 2003, P 8 EUR C SPEECH COM, P1197

[8]

Wang Y., 2004, Proceedings of the 12th Annual ACM International Conference on Multimedia, P212

[9] Automatic lyrics alignment for Cantonese popular music [J].

Wong, Chi Hang ;

Szeto, Wai Man ;

Wong, Kin Hong .

MULTIMEDIA SYSTEMS, 2007, 12 (4-5) :307-323

← 1 →