Speech-Based Activity Recognition for Trauma Resuscitation

被引：0

作者：

Abdulbaqi, Jalal ^{[1
]}

Gu, Yue ^{[1
]}

Xu, Zhichao ^{[1
]}

Gao, Chenyang ^{[1
]}

Marsic, Ivan ^{[1
]}

Burd, Randall S. ^{[2
]}

机构：

[1] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08855 USA

[2] Childrens Natl Med Ctr, Trauma & Burn Surg, Washington, DC USA

来源：

2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020) | 2020年

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

activity recognition; keyword; audio classification; speech processing; trauma resuscitation; NEURAL-NETWORKS;

D O I：

10.1109/ICHI48887.2020.9374372

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a speech-based approach to recognize team activities in the context of trauma resuscitation. We first analyzed the audio recordings of trauma resuscitations in terms of activity frequency, noise-level, and activity-related keyword frequency to determine the dataset characteristics. We next evaluated different audio-preprocessing parameters (spectral feature types and audio channels) to find the optimal configuration. We then introduced a novel neural network to recognize the trauma activities using a modified VGG network that extracts features from the audio input. The output of the modified VGG network is combined with the output of a network that takes keyword text as input, and the combination is used to generate activity labels. We compared our system with several baselines and performed a detailed analysis of the performance results for specific activities. Our results show that our proposed architecture that uses Mel-spectrum spectral coefficients features with a stereo channel and activity-specific frequent keywords achieve the highest accuracy and average F1-score.

引用

页码：376 / 383

页数：8

共 30 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[3]

[Anonymous], 2010, P ARCS

[4]

[Anonymous], 2015, 3 INT C LEARN REPR

[5]

Ba J., 2015, INT C LEARNING REPRE

[6]

Boashash B., 2015, TIME FREQUENCY SIGNA

[7]

Chakraborty I, 2013, IEEE INT CONF AUTOMA

[8]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[9]

Eghbal-zadeh H, 2017, EUR SIGNAL PR CONF, P2749, DOI 10.23919/EUSIPCO.2017.8081711

[10]

Gui Y, 2019, IEEE INT CONF HEALT, P251, DOI [10.1109/ichi.2019.8904713, 10.23919/iconac.2019.8895027, 10.1109/isgt.2019.8791575]

← 1 2 3 →