Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions

被引：2

作者：

Chen, Zhuohao ^{[1
]}

Flemotomos, Nikolaos ^{[1
]}

Ardulov, Victor ^{[1
]}

Creed, Torrey A. ^{[2
]}

Imel, Zac E. ^{[3
]}

Atkins, David C. ^{[4
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Los Angeles, CA 90007 USA

[2] Univ Penn, Dept Psychiat, Philadelphia, PA 19104 USA

[3] Univ Utah, Dept Educ Psychol, Salt Lake City, UT 84112 USA

[4] Univ Washington, Dept Psychiat & Behav Sci, Seattle, WA 98195 USA

来源：

2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC) | 2021年

关键词：

Cognitive behavioral therapy; Motivational Interviewing; end-to-end evaluation; feature fusion strategies; LANGUAGE;

D O I：

10.1109/EMBC46164.2021.9629694

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Cognitive Behavioral Therapy (CBT) is a goal-oriented psychotherapy for mental health concerns implemented in a conversational setting. The quality of a CBT session is typically assessed by trained human raters who manually assign pre-defined session-level behavioral codes. In this paper, we develop an end-to-end pipeline that converts speech audio to diarized and transcribed text and extracts linguistic features to code the CBT sessions automatically. We investigate both word-level and utterance-level features and propose feature fusion strategies to combine them. The utterance level features include dialog act tags as well as behavioral codes drawn from another well-known talk psychotherapy called Motivational Interviewing (MI). We propose a novel method to augment the word-based features with the utterance level tags for subsequent CBT code estimation. Experiments show that our new fusion strategy outperforms all the studied features, both when used individually and when fused by direct concatenation. We also find that incorporating a sentence segmentation module can further improve the overall system given the preponderance of multi-utterance conversational turns in CBT sessions.

引用

页码：1836 / 1839

页数：4

共 50 条

[1] End-to-end feature fusion Siamese network for adaptive visual tracking
Guo, Dongyan
Wang, Jun
Zhao, Weixuan
Cui, Ying
Wang, Zhenhua
Chen, Shengyong
IET IMAGE PROCESSING, 2021, 15 (01) : 91 - 100
[2] An End-to-End Framework for Clothing Collocation Based on Semantic Feature Fusion
Zhao, Mingbo
Liu, Yu
Li, Xianrui
Zhang, Zhao
Zhang, Yue
IEEE MULTIMEDIA, 2020, 27 (04) : 122 - 132
[3] Attentional Feature Fusion for End-to-End Blind Image Quality Assessment
Zhou, Mingliang
Lang, Shujun
Zhang, Taiping
Liao, Xingran
Shang, Zhaowei
Xiang, Tao
Fang, Bin
IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 144 - 152
[4] An end-to-end multi-resolution feature fusion defogging network
Ping Xue
ShiXiong Deng
Signal, Image and Video Processing, 2023, 17 : 4189 - 4197
[5] Region Pooling with Adaptive Feature Fusion for End-to-End Person Recognition
Kumar, Vijay
Namboodiri, Anoop
Jawahar, C., V
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2122 - 2131
[6] An end-to-end multi-resolution feature fusion defogging network
Xue, Ping
Deng, ShiXiong
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (08) : 4189 - 4197
[7] Multilevel Feature Fusion for End-to-End Blind Image Quality Assessment
Lan, Xuting
Zhou, Mingliang
Xu, Xueyong
Wei, Xuekai
Liao, Xingran
Pu, Huayan
Luo, Jun
Xiang, Tao
Fang, Bin
Shang, Zhaowei
IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (03) : 801 - 811
[8] Feature Fusion Pyramid Network for End-to-End Scene Text Detection
Wu, Yirui
Zhang, Lilai
Li, Hao
Zhang, Yunfei
Wan, Shaohua
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (11)
[9] INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION
Hu, Yuchen
Hou, Nana
Chen, Chen
Chng, Eng Siong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6292 - 6296
[10] End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion
Hou, Congcong
Wu, Xiaoyu
Wang, Ge
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 501 - 510

← 1 2 3 4 5 →