Temporally Guided Music-to-Body-Movement Generation

被引：24

作者：

Kao, Hsuan-Kai ^{[1
]}

Su, Li ^{[1
]}

机构：

[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

Neural networks; pose estimation; body movement generation; music information retrieval;

D O I：

10.1145/3394171.3413848

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate 3-D violinists' body movements considering key features in musical body movement.

引用

页码：147 / 155

页数：9

共 31 条

[1]

[Anonymous], 2008, THESIS U OSLO UNIPUB

[2]

[Anonymous], 2017, P SOUND MUS COMP

[3]

[Anonymous], 2010, GESTURES

[4]

Berg Tamara, 2012, P C SOC EL MUS US

[5] RELATIONSHIPS BETWEEN PERCEIVED EMOTIONS IN MUSIC AND MUSIC-INDUCED MOVEMENT [J].

Burger, Birgitta ;

Saarikallio, Suvi ;

Luck, Geoff ;

Thompson, Marc R. ;

Toiviainen, Petri .

MUSIC PERCEPTION, 2013, 30 (05) :517-533

[6] Deep Cross-Modal Audio-Visual Generation [J].

Chen, Lele ;

Srivastava, Sudhanshu ;

Duan, Zhiyao ;

Xu, Chenliang .

PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :349-357

[7] Lip Reading Sentences in the Wild [J].

Chung, Joon Son ;

Senior, Andrew ;

Vinyals, Oriol ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3444-3450

[8] Bodily movement and facial actions in expressive musical performance by solo and duo instrumentalists: Two distinctive case studies [J].

Davidson, Jane W. .

PSYCHOLOGY OF MUSIC, 2012, 40 (05) :595-633

[9]

Farber A., 1987, Music Educators Journal, V74, P43, DOI DOI 10.2307/3397940

[10] Learning Individual Styles of Conversational Gesture [J].

Ginosar, Shiry ;

Bar, Amir ;

Kohavi, Gefen ;

Chan, Caroline ;

Owens, Andrew ;

Malik, Jitendra .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3492-3501

← 1 2 3 4 →