Speech-Based Activity Recognition for Trauma Resuscitation

被引:0
作者
Abdulbaqi, Jalal [1 ]
Gu, Yue [1 ]
Xu, Zhichao [1 ]
Gao, Chenyang [1 ]
Marsic, Ivan [1 ]
Burd, Randall S. [2 ]
机构
[1] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08855 USA
[2] Childrens Natl Med Ctr, Trauma & Burn Surg, Washington, DC USA
来源
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020) | 2020年
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
activity recognition; keyword; audio classification; speech processing; trauma resuscitation; NEURAL-NETWORKS;
D O I
10.1109/ICHI48887.2020.9374372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a speech-based approach to recognize team activities in the context of trauma resuscitation. We first analyzed the audio recordings of trauma resuscitations in terms of activity frequency, noise-level, and activity-related keyword frequency to determine the dataset characteristics. We next evaluated different audio-preprocessing parameters (spectral feature types and audio channels) to find the optimal configuration. We then introduced a novel neural network to recognize the trauma activities using a modified VGG network that extracts features from the audio input. The output of the modified VGG network is combined with the output of a network that takes keyword text as input, and the combination is used to generate activity labels. We compared our system with several baselines and performed a detailed analysis of the performance results for specific activities. Our results show that our proposed architecture that uses Mel-spectrum spectral coefficients features with a stereo channel and activity-specific frequent keywords achieve the highest accuracy and average F1-score.
引用
收藏
页码:376 / 383
页数:8
相关论文
共 30 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[3]  
[Anonymous], 2010, P ARCS
[4]  
[Anonymous], 2015, 3 INT C LEARN REPR
[5]  
Ba J., 2015, INT C LEARNING REPRE
[6]  
Boashash B., 2015, TIME FREQUENCY SIGNA
[7]  
Chakraborty I, 2013, IEEE INT CONF AUTOMA
[8]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[9]  
Eghbal-zadeh H, 2017, EUR SIGNAL PR CONF, P2749, DOI 10.23919/EUSIPCO.2017.8081711
[10]  
Gui Y, 2019, IEEE INT CONF HEALT, P251, DOI [10.1109/ichi.2019.8904713, 10.23919/iconac.2019.8895027, 10.1109/isgt.2019.8791575]