Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions

被引:2
|
作者
Chen, Zhuohao [1 ]
Flemotomos, Nikolaos [1 ]
Ardulov, Victor [1 ]
Creed, Torrey A. [2 ]
Imel, Zac E. [3 ]
Atkins, David C. [4 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Los Angeles, CA 90007 USA
[2] Univ Penn, Dept Psychiat, Philadelphia, PA 19104 USA
[3] Univ Utah, Dept Educ Psychol, Salt Lake City, UT 84112 USA
[4] Univ Washington, Dept Psychiat & Behav Sci, Seattle, WA 98195 USA
来源
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC) | 2021年
关键词
Cognitive behavioral therapy; Motivational Interviewing; end-to-end evaluation; feature fusion strategies; LANGUAGE;
D O I
10.1109/EMBC46164.2021.9629694
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Cognitive Behavioral Therapy (CBT) is a goal-oriented psychotherapy for mental health concerns implemented in a conversational setting. The quality of a CBT session is typically assessed by trained human raters who manually assign pre-defined session-level behavioral codes. In this paper, we develop an end-to-end pipeline that converts speech audio to diarized and transcribed text and extracts linguistic features to code the CBT sessions automatically. We investigate both word-level and utterance-level features and propose feature fusion strategies to combine them. The utterance level features include dialog act tags as well as behavioral codes drawn from another well-known talk psychotherapy called Motivational Interviewing (MI). We propose a novel method to augment the word-based features with the utterance level tags for subsequent CBT code estimation. Experiments show that our new fusion strategy outperforms all the studied features, both when used individually and when fused by direct concatenation. We also find that incorporating a sentence segmentation module can further improve the overall system given the preponderance of multi-utterance conversational turns in CBT sessions.
引用
收藏
页码:1836 / 1839
页数:4
相关论文
共 50 条
  • [1] End-to-end feature fusion Siamese network for adaptive visual tracking
    Guo, Dongyan
    Wang, Jun
    Zhao, Weixuan
    Cui, Ying
    Wang, Zhenhua
    Chen, Shengyong
    IET IMAGE PROCESSING, 2021, 15 (01) : 91 - 100
  • [2] An End-to-End Framework for Clothing Collocation Based on Semantic Feature Fusion
    Zhao, Mingbo
    Liu, Yu
    Li, Xianrui
    Zhang, Zhao
    Zhang, Yue
    IEEE MULTIMEDIA, 2020, 27 (04) : 122 - 132
  • [3] Attentional Feature Fusion for End-to-End Blind Image Quality Assessment
    Zhou, Mingliang
    Lang, Shujun
    Zhang, Taiping
    Liao, Xingran
    Shang, Zhaowei
    Xiang, Tao
    Fang, Bin
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 144 - 152
  • [4] An end-to-end multi-resolution feature fusion defogging network
    Ping Xue
    ShiXiong Deng
    Signal, Image and Video Processing, 2023, 17 : 4189 - 4197
  • [5] Region Pooling with Adaptive Feature Fusion for End-to-End Person Recognition
    Kumar, Vijay
    Namboodiri, Anoop
    Jawahar, C., V
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2122 - 2131
  • [6] An end-to-end multi-resolution feature fusion defogging network
    Xue, Ping
    Deng, ShiXiong
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (08) : 4189 - 4197
  • [7] Multilevel Feature Fusion for End-to-End Blind Image Quality Assessment
    Lan, Xuting
    Zhou, Mingliang
    Xu, Xueyong
    Wei, Xuekai
    Liao, Xingran
    Pu, Huayan
    Luo, Jun
    Xiang, Tao
    Fang, Bin
    Shang, Zhaowei
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (03) : 801 - 811
  • [8] Feature Fusion Pyramid Network for End-to-End Scene Text Detection
    Wu, Yirui
    Zhang, Lilai
    Li, Hao
    Zhang, Yunfei
    Wan, Shaohua
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (11)
  • [9] INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION
    Hu, Yuchen
    Hou, Nana
    Chen, Chen
    Chng, Eng Siong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6292 - 6296
  • [10] End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion
    Hou, Congcong
    Wu, Xiaoyu
    Wang, Ge
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 501 - 510