Speaker Identification for Business-Card-Type Sensors

被引:4
|
作者
Yamaguchi, Shunpei [1 ]
Oshima, Ritsuko [2 ]
Oshima, Jun [2 ]
Shiina, Ryota [3 ]
Fujihashi, Takuya [1 ]
Saruwatari, Shunsuke [1 ]
Watanabe, Takashi [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka 5650871, Japan
[2] Shizuoka Univ, Grad Sch Integrated Sci & Technol, Hamamatsu, Shizuoka 4328011, Japan
[3] NTT Corp, NTT Access Network Serv Syst Labs, Musashino, Tokyo 1808585, Japan
来源
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY | 2021年 / 2卷
关键词
Human activity recognition; sensor networks; speaker identification; speaker recognition; time synchronization; LOCALIZATION; RECOGNITION; NOISY;
D O I
10.1109/OJCS.2021.3075469
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human collaboration has a great impact on the performance of multi-person activities. The analysis of speaker information and speech timing can be used to extract human collaboration data in detail. Some studies have extracted human collaboration data by identifying a speaker with business-card-type sensors. However, it is difficult to realize speaker identification for business-card-type sensors at low cost and high accuracy because of spikes in the measured sound pressure data, ambient noise in the non-speaker sensor, and synchronization errors across each sensor. This study proposes a novel sound pressure sensor and speaker identification algorithm to realize speaker identification for business-card-type sensors. The sensor extracts the user's speech at low cost and high accuracy by employing a peak hold circuit and time synchronization module for spike mitigation and precise time synchronization. The algorithm identifies a speaker with high accuracy by removing ambient noise. The evaluations show that the algorithm accurately identifies a speaker in a multi-person activity considering varying numbers of users, environmental noises, and reverberation conditions as well as long or short utterances. In addition, the peak hold circuit enables accurate extraction of speech and the synchronization error between the sensors is always within +/- 30 mu s, that is, negligible error.
引用
收藏
页码:216 / 226
页数:11
相关论文
共 50 条
  • [21] On Lexicon and Speaker Identification
    唐电弟
    海外英语, 2017, (09) : 201 - 202
  • [22] Latent prosody analysis for robust speaker identification
    Liao, Yuan-Fu
    Chen, Zi-He
    Juang, Yau-Tarng
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1870 - 1883
  • [23] The Role of Age in Factor Analysis for Speaker Identification
    Lei, Yun
    Hansen, John H. L.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2327 - 2330
  • [24] BLIND REVERBERATION MITIGATION FOR ROBUST SPEAKER IDENTIFICATION
    Sadjadi, Seyed Omid
    Hansen, John H. L.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4225 - 4228
  • [25] Visual Speaker Identification with Spatiotemporal Directional Features
    Zhao, Guoying
    Pietikainen, Matti
    IMAGE ANALYSIS AND RECOGNITION, 2013, 7950 : 1 - 10
  • [26] An MFCC-based Speaker Identification System
    Leu, Fang-Yie
    Lin, Guan-Liang
    2017 IEEE 31ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2017, : 1055 - 1062
  • [27] Learning Discriminative Features for Speaker Identification and Verification
    Yadav, Sarthak
    Rai, Atul
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2237 - 2241
  • [28] A modified speaker clustering method for efficient speaker identification
    Yan, JiaChang
    Wang, Lei
    2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 2, 2014,
  • [29] Timbre-Reserved Adversarial Attack in Speaker Identification
    Wang, Qing
    Yao, Jixun
    Zhang, Li
    Guo, Pengcheng
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3848 - 3858
  • [30] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67