Towards Music-Aware Virtual Assistants

被引:1
作者
Lindlbauer, David [1 ]
Wang, Alexander [1 ]
Donahue, Chris [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, USIT 2024 | 2024年
关键词
Audio; Music; Virtual Assistants; Notifcation; Interruptions; Speech; Machine Learning; TEXT-TO-SPEECH;
D O I
10.1145/3654777.3676416
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a system for modifying spoken notifcations in a manner that is sensitive to the music a user is listening to. Spoken notifcations provide convenient access to rich information without the need for a screen. Virtual assistants see prevalent use in handsfree settings such as driving or exercising, activities where users also regularly enjoy listening to music. In such settings, virtual assistants will temporarily mute a user's music to improve intelligibility. However, users may perceive these interruptions as intrusive, negatively impacting their music-listening experience. To address this challenge, we propose the concept of music-aware virtual assistants, where speech notifcations are modifed to resemble a voice singing in harmony with the user's music. We contribute a system that processes user music and notifcation text to produce a blended mix, replacing original song lyrics with the notifcation content. In a user study comparing musical assistants to standard virtual assistants, participants expressed that musical assistants ft better with music, reduced intrusiveness, and provided a more delightful listening experience overall.
引用
收藏
页数:14
相关论文
共 57 条
[1]   Music, Search, and IoT: How People (Really) Use Voice Assistants [J].
Ammari, Tawfiq ;
Kaye, Jofish ;
Tsai, Janice Y. ;
Bentley, Frank .
ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2019, 26 (03)
[2]  
Ananthabhotla Ishwarya, 2018, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, V2, DOI [10.1145/3287032, 10.1145/3287032]
[3]  
Huang CZA, 2018, Arxiv, DOI [arXiv:1809.04281, 10.48550/arXiv.1809.04281]
[4]   The Ethical Implications of Generative Audio Models: A Systematic Literature Review [J].
Barnett, Julia .
PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, :146-161
[5]  
Blattner M. M., 1989, Human-Computer Interaction, V4, P11, DOI 10.1207/s15327051hci0401_1
[6]  
BURLESON R, 1992, LINGUISTIQUE, V28, P49
[7]  
BUTZ A, 2005, P 10 INT C INT US IN, P320, DOI [DOI 10.1145/1040830.1040914, 10.1145/1040830.1040914]
[8]  
Celemony Software GmbH, 2024, Celemony-What is Melodyne?
[9]   Chord Conditioned Melody Generation With Transformer Based Decoders [J].
Choi, Kyoyun ;
Park, Jonggwon ;
Heo, Wan ;
Jeon, Sungwook ;
Park, Jonghun .
IEEE ACCESS, 2021, 9 :42071-42080
[10]  
Collister Lauren B, 2008, Comparison of word intelligibility in spoken and sung phrases