pyAudioAnalysis: An Open-Source Python']Python Library for Audio Signal Analysis

被引:280
作者
Giannakopoulos, Theodoros [1 ]
机构
[1] NCSR Demokritos, Computat Intelligence Lab, Inst Informat & Telecommun, Athens 15310, Greece
基金
欧盟地平线“2020”;
关键词
D O I
10.1371/journal.pone.0144610
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audiovisual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.
引用
收藏
页数:17
相关论文
共 17 条
[1]   Speaker Diarization: A Review of Recent Research [J].
Anguera Miro, Xavier ;
Bozonnet, Simon ;
Evans, Nicholas ;
Fredouille, Corinne ;
Friedland, Gerald ;
Vinyals, Oriol .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370
[2]  
[Anonymous], 2005, MPEG 7 AUDIO AUDIO C
[3]   Audio thumbnailing of popular music using chroma-based representations [J].
Bartsch, MA ;
Wakefield, GH .
IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (01) :96-104
[4]  
Giannakopoulos T, 2014, INT WORKSH ART INT A
[5]  
Giannakopoulos T, 2015, AUTOMATIC SOUNDSCAPE
[6]  
Giannakopoulos T, 2014, INTRO AUD ANAL MATLA
[7]  
Giannakopoulos T, 2012, SYSTEMS MAN CYBERN A, V20, P1913, DOI [10.1109/TASL.2012.2191285, DOI 10.1109/TASL.2012.2191285]
[8]   An experimental comparison of audio tempo induction algorithms [J].
Gouyon, Fabien ;
Klapuri, Anssi ;
Dixon, Simon ;
Alonso, Miguel ;
Tzanetakis, George ;
Uhle, Christian ;
Cano, Pedro .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05) :1832-1844
[9]  
Lehinevych T, 2014, 2014 10 INT C SIGN I
[10]  
Pikrakis A, 2004, P INT C MUS INF RETR