Automatic Speech-Based Smoking Status Identification

被引:1
作者
Ma, Zhizhong [1 ]
Singh, Satwinder [1 ]
Qiu, Yuanhang [1 ]
Hou, Feng [1 ]
Wang, Ruili [1 ]
Bullen, Christopher [2 ]
Chu, Joanna Ting Wai [2 ]
机构
[1] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand
[2] Univ Auckland, Natl Inst Hlth Innovat, Auckland, New Zealand
来源
INTELLIGENT COMPUTING, VOL 3 | 2022年 / 508卷
关键词
Smoking status identification; Speech processing; Acoustic features; CIGARETTE-SMOKING; VOICE;
D O I
10.1007/978-3-031-10467-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F-0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F-0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.
引用
收藏
页码:193 / 203
页数:11
相关论文
共 36 条
  • [1] Videostroboscopic characteristics of young adult female smokers vs. nonsmokers
    Awan, Shaheen N.
    Morrow, Danelle L.
    [J]. JOURNAL OF VOICE, 2007, 21 (02) : 211 - 223
  • [2] The Effect of Smoking on the Dysphonia Severity Index in Females
    Awan, Shaheen N.
    [J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 2011, 63 (02) : 65 - 71
  • [3] Boersma P., 2021, Praat: Doing phonetics by computer
  • [4] Brandschain L, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P2441
  • [5] Brandschain L, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P3551
  • [6] Braun A., 2019, AUTOMATIC SPEAKER RE, P161
  • [7] Perturbation and Nonlinear Dynamic Analysis of Adult Male Smokers
    Chai, Lingying
    Sprecher, Alicia J.
    Zhang, Yi
    Liang, Yufang
    Chen, Huijun
    Jiang, Jack J.
    [J]. JOURNAL OF VOICE, 2011, 25 (03) : 342 - 347
  • [8] Dave N., 2013, International journal for advance research in engineering and technology, V1, P1
  • [9] Dirk L, 2011, PROC 17 INT C PHONET, P1
  • [10] Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion Labels
    Fujioka, Takuya
    Homma, Takeshi
    Nagamatsu, Kenji
    [J]. INTERSPEECH 2020, 2020, : 2332 - 2336