Towards Music-Aware Virtual Assistants

被引：1

作者：

Lindlbauer, David ^{[1
]}

Wang, Alexander ^{[1
]}

Donahue, Chris ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, USIT 2024 | 2024年

关键词：

Audio; Music; Virtual Assistants; Notifcation; Interruptions; Speech; Machine Learning; TEXT-TO-SPEECH;

D O I：

10.1145/3654777.3676416

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a system for modifying spoken notifcations in a manner that is sensitive to the music a user is listening to. Spoken notifcations provide convenient access to rich information without the need for a screen. Virtual assistants see prevalent use in handsfree settings such as driving or exercising, activities where users also regularly enjoy listening to music. In such settings, virtual assistants will temporarily mute a user's music to improve intelligibility. However, users may perceive these interruptions as intrusive, negatively impacting their music-listening experience. To address this challenge, we propose the concept of music-aware virtual assistants, where speech notifcations are modifed to resemble a voice singing in harmony with the user's music. We contribute a system that processes user music and notifcation text to produce a blended mix, replacing original song lyrics with the notifcation content. In a user study comparing musical assistants to standard virtual assistants, participants expressed that musical assistants ft better with music, reduced intrusiveness, and provided a more delightful listening experience overall.

引用

页数：14

共 57 条

[1] Music, Search, and IoT: How People (Really) Use Voice Assistants [J].

Ammari, Tawfiq ;

Kaye, Jofish ;

Tsai, Janice Y. ;

Bentley, Frank .

ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2019, 26 (03)

[2]

Ananthabhotla Ishwarya, 2018, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, V2, DOI [10.1145/3287032, 10.1145/3287032]

[3]

Huang CZA, 2018, Arxiv, DOI [arXiv:1809.04281, 10.48550/arXiv.1809.04281]

[4] The Ethical Implications of Generative Audio Models: A Systematic Literature Review [J].

Barnett, Julia .

PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, :146-161

[5]

Blattner M. M., 1989, Human-Computer Interaction, V4, P11, DOI 10.1207/s15327051hci0401_1

[6]

BURLESON R, 1992, LINGUISTIQUE, V28, P49

[7]

BUTZ A, 2005, P 10 INT C INT US IN, P320, DOI [DOI 10.1145/1040830.1040914, 10.1145/1040830.1040914]

[8]

Celemony Software GmbH, 2024, Celemony-What is Melodyne?

[9] Chord Conditioned Melody Generation With Transformer Based Decoders [J].

Choi, Kyoyun ;

Park, Jonggwon ;

Heo, Wan ;

Jeon, Sungwook ;

Park, Jonghun .

IEEE ACCESS, 2021, 9 :42071-42080

[10]

Collister Lauren B, 2008, Comparison of word intelligibility in spoken and sung phrases

← 1 2 3 4 5 6 →