Temporally Guided Music-to-Body-Movement Generation

被引:24
作者
Kao, Hsuan-Kai [1 ]
Su, Li [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
关键词
Neural networks; pose estimation; body movement generation; music information retrieval;
D O I
10.1145/3394171.3413848
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate 3-D violinists' body movements considering key features in musical body movement.
引用
收藏
页码:147 / 155
页数:9
相关论文
共 31 条
[1]  
[Anonymous], 2008, THESIS U OSLO UNIPUB
[2]  
[Anonymous], 2017, P SOUND MUS COMP
[3]  
[Anonymous], 2010, GESTURES
[4]  
Berg Tamara, 2012, P C SOC EL MUS US
[5]   RELATIONSHIPS BETWEEN PERCEIVED EMOTIONS IN MUSIC AND MUSIC-INDUCED MOVEMENT [J].
Burger, Birgitta ;
Saarikallio, Suvi ;
Luck, Geoff ;
Thompson, Marc R. ;
Toiviainen, Petri .
MUSIC PERCEPTION, 2013, 30 (05) :517-533
[6]   Deep Cross-Modal Audio-Visual Generation [J].
Chen, Lele ;
Srivastava, Sudhanshu ;
Duan, Zhiyao ;
Xu, Chenliang .
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :349-357
[7]   Lip Reading Sentences in the Wild [J].
Chung, Joon Son ;
Senior, Andrew ;
Vinyals, Oriol ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3444-3450
[8]   Bodily movement and facial actions in expressive musical performance by solo and duo instrumentalists: Two distinctive case studies [J].
Davidson, Jane W. .
PSYCHOLOGY OF MUSIC, 2012, 40 (05) :595-633
[9]  
Farber A., 1987, Music Educators Journal, V74, P43, DOI DOI 10.2307/3397940
[10]   Learning Individual Styles of Conversational Gesture [J].
Ginosar, Shiry ;
Bar, Amir ;
Kohavi, Gefen ;
Chan, Caroline ;
Owens, Andrew ;
Malik, Jitendra .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3492-3501