Automatic Speech-Based Smoking Status Identification

被引：1

作者：

Ma, Zhizhong ^{[1
]}

Singh, Satwinder ^{[1
]}

Qiu, Yuanhang ^{[1
]}

Hou, Feng ^{[1
]}

Wang, Ruili ^{[1
]}

Bullen, Christopher ^{[2
]}

Chu, Joanna Ting Wai ^{[2
]}

机构：

[1] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand

[2] Univ Auckland, Natl Inst Hlth Innovat, Auckland, New Zealand

来源：

INTELLIGENT COMPUTING, VOL 3 | 2022年 / 508卷

关键词：

Smoking status identification; Speech processing; Acoustic features; CIGARETTE-SMOKING; VOICE;

D O I：

10.1007/978-3-031-10467-1_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F-0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F-0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.

引用

页码：193 / 203

页数：11

共 36 条

[1] Videostroboscopic characteristics of young adult female smokers vs. nonsmokers
Awan, Shaheen N.
Morrow, Danelle L.
[J]. JOURNAL OF VOICE, 2007, 21 (02) : 211 - 223
[2] The Effect of Smoking on the Dysphonia Severity Index in Females
Awan, Shaheen N.
[J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 2011, 63 (02) : 65 - 71
[3] Boersma P., 2021, Praat: Doing phonetics by computer
[4] Brandschain L, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P2441
[5] Brandschain L, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P3551
[6] Braun A., 2019, AUTOMATIC SPEAKER RE, P161
[7] Perturbation and Nonlinear Dynamic Analysis of Adult Male Smokers
Chai, Lingying
Sprecher, Alicia J.
Zhang, Yi
Liang, Yufang
Chen, Huijun
Jiang, Jack J.
[J]. JOURNAL OF VOICE, 2011, 25 (03) : 342 - 347
[8] Dave N., 2013, International journal for advance research in engineering and technology, V1, P1
[9] Dirk L, 2011, PROC 17 INT C PHONET, P1
[10] Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion Labels
Fujioka, Takuya
Homma, Takeshi
Nagamatsu, Kenji
[J]. INTERSPEECH 2020, 2020, : 2332 - 2336

← 1 2 3 4 →